diff mbox

[v2] pseries/iommu: remove iommu device references via bus notifier

Message ID 20150221190050.GA20184@linux.vnet.ibm.com (mailing list archive)
State Accepted
Delegated to: Michael Ellerman
Headers show

Commit Message

Nishanth Aravamudan Feb. 21, 2015, 7 p.m. UTC
On 20.02.2015 [15:31:29 +1100], Michael Ellerman wrote:
> On Thu, 2015-02-19 at 10:41 -0800, Nishanth Aravamudan wrote:
> > After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
> > refcnt on the kobject backing the IOMMU group for a PCI device is
> > elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
> > set_iommu_table_base_and_group). When we go to dlpar a multi-function
> > PCI device out:
> > 
> > 	iommu_reconfig_notifier ->
> > 		iommu_free_table ->
> > 			iommu_group_put
> > 			BUG_ON(tbl->it_group)
> > 
> > We trip this BUG_ON, because there are still references on the table, so
> > it is not freed. Fix this by also adding a bus notifier identical to
> > PowerNV for pSeries.
> 
> Please put it somewhere common, arch/powerpc/kernel/iommu.c perhaps, and just
> add a second machine_init_call() for pseries.

How does this look? Only compile-tested with CONFIG_IOMMU_API on/off so
far, waiting for access to the test LPAR (should have it on Monday).


After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
refcnt on the kobject backing the IOMMU group for a PCI device is
elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
set_iommu_table_base_and_group). When we go to dlpar a multi-function
PCI device out:

        iommu_reconfig_notifier ->
                iommu_free_table ->
                        iommu_group_put
                        BUG_ON(tbl->it_group)

We trip this BUG_ON, because there are still references on the table, so
it is not freed. Fix this by moving the PowerNV bus notifier to common
code and calling it for both PowerNV and pSeries.

Fixes: d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: stable@kernel.org (3.13+)

---
v1 -> v2:
  Move powernv code to common file, just add machine_init_call for pseries.
  Suggested by Michael Ellerman.

Michael, I'll send another update once I have testing results.

Comments

Michael Ellerman Feb. 23, 2015, 2:27 a.m. UTC | #1
On Sat, 2015-21-02 at 19:00:50 UTC, Nishanth Aravamudan wrote:
> On 20.02.2015 [15:31:29 +1100], Michael Ellerman wrote:
> > On Thu, 2015-02-19 at 10:41 -0800, Nishanth Aravamudan wrote:
> > > After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
> > > refcnt on the kobject backing the IOMMU group for a PCI device is
> > > elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
> > > set_iommu_table_base_and_group). When we go to dlpar a multi-function
> > > PCI device out:
> > > 
> > > 	iommu_reconfig_notifier ->
> > > 		iommu_free_table ->
> > > 			iommu_group_put
> > > 			BUG_ON(tbl->it_group)
> > > 
> > > We trip this BUG_ON, because there are still references on the table, so
> > > it is not freed. Fix this by also adding a bus notifier identical to
> > > PowerNV for pSeries.
> > 
> > Please put it somewhere common, arch/powerpc/kernel/iommu.c perhaps, and just
> > add a second machine_init_call() for pseries.
> 
> How does this look? Only compile-tested with CONFIG_IOMMU_API on/off so
> far, waiting for access to the test LPAR (should have it on Monday).

Yeah that looks better, thanks.

It probably doesn't build with CONFIG_PCI=n though, but I don't think
CONFIG_PCI=n builds anyway.

cheers
Nishanth Aravamudan Feb. 23, 2015, 6:54 p.m. UTC | #2
On 23.02.2015 [13:27:24 +1100], Michael Ellerman wrote:
> On Sat, 2015-21-02 at 19:00:50 UTC, Nishanth Aravamudan wrote:
> > On 20.02.2015 [15:31:29 +1100], Michael Ellerman wrote:
> > > On Thu, 2015-02-19 at 10:41 -0800, Nishanth Aravamudan wrote:
> > > > After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
> > > > refcnt on the kobject backing the IOMMU group for a PCI device is
> > > > elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
> > > > set_iommu_table_base_and_group). When we go to dlpar a multi-function
> > > > PCI device out:
> > > > 
> > > > 	iommu_reconfig_notifier ->
> > > > 		iommu_free_table ->
> > > > 			iommu_group_put
> > > > 			BUG_ON(tbl->it_group)
> > > > 
> > > > We trip this BUG_ON, because there are still references on the table, so
> > > > it is not freed. Fix this by also adding a bus notifier identical to
> > > > PowerNV for pSeries.
> > > 
> > > Please put it somewhere common, arch/powerpc/kernel/iommu.c perhaps, and just
> > > add a second machine_init_call() for pseries.
> > 
> > How does this look? Only compile-tested with CONFIG_IOMMU_API on/off so
> > far, waiting for access to the test LPAR (should have it on Monday).
> 
> Yeah that looks better, thanks.
> 
> It probably doesn't build with CONFIG_PCI=n though, but I don't think
> CONFIG_PCI=n builds anyway.

Indeed it doesn't. Started looking at CONFIG_PCI=n and immediately hit
the following:

PCI_MSI depends on PCI

PCI can be manually turned off

PSERIES (and a bunch of other platforms) select PCI_MSI

So you end up with PCI_MSI on and PCI off and the build breaks.

Should the platforms depend on PCI_MSI instead?

Per the Documentation:
"        select should be used with care. select will force
        a symbol to a value without visiting the dependencies.
        By abusing select you are able to select a symbol FOO even
        if FOO depends on BAR that is not set."

Thanks,
Nish
Nishanth Aravamudan Feb. 23, 2015, 8:44 p.m. UTC | #3
On 21.02.2015 [11:00:50 -0800], Nishanth Aravamudan wrote:
> On 20.02.2015 [15:31:29 +1100], Michael Ellerman wrote:
> > On Thu, 2015-02-19 at 10:41 -0800, Nishanth Aravamudan wrote:
> > > After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
> > > refcnt on the kobject backing the IOMMU group for a PCI device is
> > > elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
> > > set_iommu_table_base_and_group). When we go to dlpar a multi-function
> > > PCI device out:
> > > 
> > > 	iommu_reconfig_notifier ->
> > > 		iommu_free_table ->
> > > 			iommu_group_put
> > > 			BUG_ON(tbl->it_group)
> > > 
> > > We trip this BUG_ON, because there are still references on the table, so
> > > it is not freed. Fix this by also adding a bus notifier identical to
> > > PowerNV for pSeries.
> > 
> > Please put it somewhere common, arch/powerpc/kernel/iommu.c perhaps, and just
> > add a second machine_init_call() for pseries.
> 
> How does this look? Only compile-tested with CONFIG_IOMMU_API on/off so
> far, waiting for access to the test LPAR (should have it on Monday).
> 
> 
> After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
> refcnt on the kobject backing the IOMMU group for a PCI device is
> elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
> set_iommu_table_base_and_group). When we go to dlpar a multi-function
> PCI device out:
> 
>         iommu_reconfig_notifier ->
>                 iommu_free_table ->
>                         iommu_group_put
>                         BUG_ON(tbl->it_group)
> 
> We trip this BUG_ON, because there are still references on the table, so
> it is not freed. Fix this by moving the PowerNV bus notifier to common
> code and calling it for both PowerNV and pSeries.

Survived a remove -> add -> remove cycle, which always resulted in the
BUG_ON without the change.

> Fixes: d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier")
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Cc: stable@kernel.org (3.13+)

Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Michael Ellerman Feb. 24, 2015, 4:42 a.m. UTC | #4
On Mon, 2015-02-23 at 10:54 -0800, Nishanth Aravamudan wrote:
> On 23.02.2015 [13:27:24 +1100], Michael Ellerman wrote:
> > On Sat, 2015-21-02 at 19:00:50 UTC, Nishanth Aravamudan wrote:
> > > On 20.02.2015 [15:31:29 +1100], Michael Ellerman wrote:
> > > > On Thu, 2015-02-19 at 10:41 -0800, Nishanth Aravamudan wrote:
> > > > > After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
> > > > > refcnt on the kobject backing the IOMMU group for a PCI device is
> > > > > elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
> > > > > set_iommu_table_base_and_group). When we go to dlpar a multi-function
> > > > > PCI device out:
> > > > > 
> > > > > 	iommu_reconfig_notifier ->
> > > > > 		iommu_free_table ->
> > > > > 			iommu_group_put
> > > > > 			BUG_ON(tbl->it_group)
> > > > > 
> > > > > We trip this BUG_ON, because there are still references on the table, so
> > > > > it is not freed. Fix this by also adding a bus notifier identical to
> > > > > PowerNV for pSeries.
> > > > 
> > > > Please put it somewhere common, arch/powerpc/kernel/iommu.c perhaps, and just
> > > > add a second machine_init_call() for pseries.
> > > 
> > > How does this look? Only compile-tested with CONFIG_IOMMU_API on/off so
> > > far, waiting for access to the test LPAR (should have it on Monday).
> > 
> > Yeah that looks better, thanks.
> > 
> > It probably doesn't build with CONFIG_PCI=n though, but I don't think
> > CONFIG_PCI=n builds anyway.
> 
> Indeed it doesn't. Started looking at CONFIG_PCI=n and immediately hit
> the following:
> 
> PCI_MSI depends on PCI
> 
> PCI can be manually turned off
> 
> PSERIES (and a bunch of other platforms) select PCI_MSI
> 
> So you end up with PCI_MSI on and PCI off and the build breaks.
> 
> Should the platforms depend on PCI_MSI instead?

No, they don't depend on it, they would just like it if PCI is enabled.

That can be fixed fairly easily by making it:

config PSERIES
	select PCI_MSI if PCI


But you then discover that there are ten other places where the build breaks
for PCI=n.

I'm starting to think we should just force PCI on for PSERIES and be done with
it, we could all spend less of our time chasing build breaks for configurations
no one actually cares about in practice (ie. PSERIES=y PCI=n).

cheers
diff mbox

Patch

diff -urpN linux-3.19/arch/powerpc/include/asm/iommu.h linux-3.19-dev/arch/powerpc/include/asm/iommu.h
--- linux-3.19/arch/powerpc/include/asm/iommu.h	2015-02-08 18:54:22.000000000 -0800
+++ linux-3.19-dev/arch/powerpc/include/asm/iommu.h	2015-02-21 09:03:55.960995053 -0800
@@ -113,6 +113,7 @@  extern void iommu_register_group(struct
 				 int pci_domain_number, unsigned long pe_num);
 extern int iommu_add_device(struct device *dev);
 extern void iommu_del_device(struct device *dev);
+extern int __init tce_iommu_bus_notifier_init(void);
 #else
 static inline void iommu_register_group(struct iommu_table *tbl,
 					int pci_domain_number,
@@ -128,6 +129,11 @@  static inline int iommu_add_device(struc
 static inline void iommu_del_device(struct device *dev)
 {
 }
+
+static inline int __init tce_iommu_bus_notifier_init(void) 
+{ 
+        return 0; 
+} 
 #endif /* !CONFIG_IOMMU_API */
 
 static inline void set_iommu_table_base_and_group(struct device *dev,
diff -urpN linux-3.19/arch/powerpc/kernel/iommu.c linux-3.19-dev/arch/powerpc/kernel/iommu.c
--- linux-3.19/arch/powerpc/kernel/iommu.c	2015-02-08 18:54:22.000000000 -0800
+++ linux-3.19-dev/arch/powerpc/kernel/iommu.c	2015-02-20 17:50:19.229927080 -0800
@@ -1175,4 +1175,30 @@  void iommu_del_device(struct device *dev
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
 
+static int tce_iommu_bus_notifier(struct notifier_block *nb,
+                unsigned long action, void *data)
+{
+        struct device *dev = data;
+
+        switch (action) {
+        case BUS_NOTIFY_ADD_DEVICE:
+                return iommu_add_device(dev);
+        case BUS_NOTIFY_DEL_DEVICE:
+                if (dev->iommu_group)
+                        iommu_del_device(dev);
+                return 0;
+        default:
+                return 0;
+        }
+}
+
+static struct notifier_block tce_iommu_bus_nb = {
+        .notifier_call = tce_iommu_bus_notifier,
+};
+
+int __init tce_iommu_bus_notifier_init(void)
+{
+        bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
+        return 0;
+}
 #endif /* CONFIG_IOMMU_API */
diff -urpN linux-3.19/arch/powerpc/platforms/powernv/pci.c linux-3.19-dev/arch/powerpc/platforms/powernv/pci.c
--- linux-3.19/arch/powerpc/platforms/powernv/pci.c	2015-02-08 18:54:22.000000000 -0800
+++ linux-3.19-dev/arch/powerpc/platforms/powernv/pci.c	2015-02-20 17:50:33.917927464 -0800
@@ -866,30 +866,4 @@  void __init pnv_pci_init(void)
 #endif
 }
 
-static int tce_iommu_bus_notifier(struct notifier_block *nb,
-		unsigned long action, void *data)
-{
-	struct device *dev = data;
-
-	switch (action) {
-	case BUS_NOTIFY_ADD_DEVICE:
-		return iommu_add_device(dev);
-	case BUS_NOTIFY_DEL_DEVICE:
-		if (dev->iommu_group)
-			iommu_del_device(dev);
-		return 0;
-	default:
-		return 0;
-	}
-}
-
-static struct notifier_block tce_iommu_bus_nb = {
-	.notifier_call = tce_iommu_bus_notifier,
-};
-
-static int __init tce_iommu_bus_notifier_init(void)
-{
-	bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
-	return 0;
-}
 machine_subsys_initcall_sync(powernv, tce_iommu_bus_notifier_init);
diff -urpN linux-3.19/arch/powerpc/platforms/pseries/iommu.c linux-3.19-dev/arch/powerpc/platforms/pseries/iommu.c
--- linux-3.19/arch/powerpc/platforms/pseries/iommu.c	2015-02-08 18:54:22.000000000 -0800
+++ linux-3.19-dev/arch/powerpc/platforms/pseries/iommu.c	2015-02-20 17:51:23.265928866 -0800
@@ -1340,3 +1340,5 @@  static int __init disable_multitce(char
 }
 
 __setup("multitce=", disable_multitce);
+
+machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);