diff mbox series

[v2,01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE

Message ID 20191121134918.7155-2-fbarrat@linux.ibm.com (mailing list archive)
State Accepted
Commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33
Headers show
Series opencapi: enable card reset and link retraining | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (784eee1cc44801366d4f197e0ade7739ee8e1e83)
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 46 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Frederic Barrat Nov. 21, 2019, 1:49 p.m. UTC
The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").

We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.

Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.

CC: aik@ozlabs.ru
CC: oohall@gmail.com
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---

Changelog:
v2:
 - clarify the ref counting (and leak) done on npu devices when
   setting up the PE
 - rework commit message


 arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

Comments

Andrew Donnellan Jan. 8, 2020, 7:33 a.m. UTC | #1
On 22/11/19 12:49 am, Frederic Barrat wrote:
> The pci_dn structure used to store a pointer to the struct pci_dev, so
> taking a reference on the device was required. However, the pci_dev
> pointer was later removed from the pci_dn structure, but the reference
> was kept for the npu device.
> See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
> pcidev from pci_dn").
> 
> We don't need to take a reference on the device when assigning the PE
> as the struct pnv_ioda_pe is cleaned up at the same time as
> the (physical) device is released. Doing so prevents the device from
> being released, which is a problem for opencapi devices, since we want
> to be able to remove them through PCI hotplug.
> 
> Now the ugly part: nvlink npu devices are not meant to be
> released. Because of the above, we've always leaked a reference and
> simply removing it now is dangerous and would likely require more
> work. There's currently no release device callback for nvlink devices
> for example. So to be safe, this patch leaks a reference on the npu
> device, but only for nvlink and not opencapi.
> 
> CC: aik@ozlabs.ru
> CC: oohall@gmail.com
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

It took me a while to parse exactly what you're doing here.

- In pnv_ioda_setup_dev_PE(), we take a reference on the pci_dev, this 
is to protect a pointer in the pci_dn structure, but not to protect the 
pointer in the pci_dev structure (which doesn't need to be protected by 
taking a reference, because the lifetime of the pnv_ioda_pe is the same 
as the pci_dev).

- The pointer in the pci_dn structure has now been removed, so we should 
remove the pci_dev_get() accordingly.

This seems okay to me, though anything PE-related is irritatingly hairy 
so one can never be truly certain...

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Frederic Barrat Jan. 8, 2020, 2:11 p.m. UTC | #2
Le 08/01/2020 à 08:33, Andrew Donnellan a écrit :
> On 22/11/19 12:49 am, Frederic Barrat wrote:
>> The pci_dn structure used to store a pointer to the struct pci_dev, so
>> taking a reference on the device was required. However, the pci_dev
>> pointer was later removed from the pci_dn structure, but the reference
>> was kept for the npu device.
>> See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
>> pcidev from pci_dn").
>>
>> We don't need to take a reference on the device when assigning the PE
>> as the struct pnv_ioda_pe is cleaned up at the same time as
>> the (physical) device is released. Doing so prevents the device from
>> being released, which is a problem for opencapi devices, since we want
>> to be able to remove them through PCI hotplug.
>>
>> Now the ugly part: nvlink npu devices are not meant to be
>> released. Because of the above, we've always leaked a reference and
>> simply removing it now is dangerous and would likely require more
>> work. There's currently no release device callback for nvlink devices
>> for example. So to be safe, this patch leaks a reference on the npu
>> device, but only for nvlink and not opencapi.
>>
>> CC: aik@ozlabs.ru
>> CC: oohall@gmail.com
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> 
> It took me a while to parse exactly what you're doing here.
> 
> - In pnv_ioda_setup_dev_PE(), we take a reference on the pci_dev, this 
> is to protect a pointer in the pci_dn structure, but not to protect the 
> pointer in the pci_dev structure (which doesn't need to be protected by 
> taking a reference, because the lifetime of the pnv_ioda_pe is the same 
> as the pci_dev).
> 
> - The pointer in the pci_dn structure has now been removed, so we should 
> remove the pci_dev_get() accordingly.


Correct. Did I do such a bad job explaining it in the commit message 
that I need to rephrase?

   Fred


> This seems okay to me, though anything PE-related is irritatingly hairy 
> so one can never be truly certain...
> 
> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
>
Andrew Donnellan Jan. 8, 2020, 8:19 p.m. UTC | #3
On 9/1/20 1:11 am, Frederic Barrat wrote:
>> It took me a while to parse exactly what you're doing here.
>>
>> - In pnv_ioda_setup_dev_PE(), we take a reference on the pci_dev, this 
>> is to protect a pointer in the pci_dn structure, but not to protect 
>> the pointer in the pci_dev structure (which doesn't need to be 
>> protected by taking a reference, because the lifetime of the 
>> pnv_ioda_pe is the same as the pci_dev).
>>
>> - The pointer in the pci_dn structure has now been removed, so we 
>> should remove the pci_dev_get() accordingly.
> 
> 
> Correct. Did I do such a bad job explaining it in the commit message 
> that I need to rephrase?

I might just be a bit slow :)
Michael Ellerman Jan. 29, 2020, 5:17 a.m. UTC | #4
On Thu, 2019-11-21 at 13:49:08 UTC, Frederic Barrat wrote:
> The pci_dn structure used to store a pointer to the struct pci_dev, so
> taking a reference on the device was required. However, the pci_dev
> pointer was later removed from the pci_dn structure, but the reference
> was kept for the npu device.
> See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
> pcidev from pci_dn").
> 
> We don't need to take a reference on the device when assigning the PE
> as the struct pnv_ioda_pe is cleaned up at the same time as
> the (physical) device is released. Doing so prevents the device from
> being released, which is a problem for opencapi devices, since we want
> to be able to remove them through PCI hotplug.
> 
> Now the ugly part: nvlink npu devices are not meant to be
> released. Because of the above, we've always leaked a reference and
> simply removing it now is dangerous and would likely require more
> work. There's currently no release device callback for nvlink devices
> for example. So to be safe, this patch leaks a reference on the npu
> device, but only for nvlink and not opencapi.
> 
> CC: aik@ozlabs.ru
> CC: oohall@gmail.com
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/05dd7da76986937fb288b4213b1fa10dbe0d1b33

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c28d0d9b7ee0..155333872cb4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1062,14 +1062,13 @@  static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		return NULL;
 	}
 
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
+	/* NOTE: We don't get a reference for the pointer in the PE
+	 * data structure, both the device and PE structures should be
+	 * destroyed at the same time. However, removing nvlink
+	 * devices will need some work.
 	 *
 	 * At some point we want to remove the PDN completely anyways
 	 */
-	pci_dev_get(dev);
 	pdn->pe_number = pe->pe_number;
 	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
@@ -1084,7 +1083,6 @@  static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		pnv_ioda_free_pe(pe);
 		pdn->pe_number = IODA_INVALID_PE;
 		pe->pdev = NULL;
-		pci_dev_put(dev);
 		return NULL;
 	}
 
@@ -1205,6 +1203,14 @@  static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 	struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 
+	/*
+	 * Intentionally leak a reference on the npu device (for
+	 * nvlink only; this is not an opencapi path) to make sure it
+	 * never goes away, as it's been the case all along and some
+	 * work is needed otherwise.
+	 */
+	pci_dev_get(npu_pdev);
+
 	/*
 	 * Due to a hardware errata PE#0 on the NPU is reserved for
 	 * error handling. This means we only have three PEs remaining
@@ -1228,7 +1234,6 @@  static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 			 */
 			dev_info(&npu_pdev->dev,
 				"Associating to existing PE %x\n", pe_num);
-			pci_dev_get(npu_pdev);
 			npu_pdn = pci_get_pdn(npu_pdev);
 			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
 			npu_pdn->pe_number = pe_num;