Message ID | 20190821062655.19735-1-oohall@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 1fb4124ca9d456656a324f1ee29b7bf942f59ac8 |
Headers | show |
Series | [1/3] powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (c9633332103e55bc73d80d07ead28b95a22a85a3) |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 23 lines checked |
On Wed, Aug 21, 2019 at 04:26:53PM +1000, Oliver O'Halloran wrote: > When disabling virtual functions on an SR-IOV adapter we currently do not > correctly remove the EEH state for the now-dead virtual functions. When > removing the pci_dn that was created for the VF when SR-IOV was enabled > we free the corresponding eeh_dev without removing it from the child device > list of the eeh_pe that contained it. This can result in crashes due to the > use-after-free. > > Signed-off-by: Oliver O'Halloran <oohall@gmail.com> > --- > No Fixes: here since I'm not sure if the commit that added this actually > introduced the bug. EEH is amazing. Yep. > I suspect backporting this would cause more problems than it solves since > reliably replicating the crash required enabling memory poisoning and > hacking a device driver to remove the PCI error handling callbacks so > the EEH fallback path (which removes and re-probes PCI devices) > would be used. I gave this a quick test with some added instrumentation, and I can see that the new code is used during VF removal and it doesn't cause any new problems. I agree that even if it's difficult to trigger, it was definitely a bug. Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> > --- > arch/powerpc/kernel/pci_dn.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c > index 6556b57..795c4e3 100644 > --- a/arch/powerpc/kernel/pci_dn.c > +++ b/arch/powerpc/kernel/pci_dn.c > @@ -244,9 +244,22 @@ void remove_dev_pci_data(struct pci_dev *pdev) > continue; > > #ifdef CONFIG_EEH > - /* Release EEH device for the VF */ > + /* > + * Release EEH state for this VF. The PCI core > + * has already torn down the pci_dev for this VF, but > + * we're responsible to removing the eeh_dev since it > + * has the same lifetime as the pci_dn that spawned it. > + */ > edev = pdn_to_eeh_dev(pdn); > if (edev) { > + /* > + * We allocate pci_dn's for the totalvfs count, > + * but only only the vfs that were activated > + * have a configured PE. > + */ > + if (edev->pe) > + eeh_rmv_from_parent_pe(edev); > + > pdn->edev = NULL; > kfree(edev); > } > -- > 2.9.5 >
On Wed, 2019-08-21 at 06:26:53 UTC, Oliver O'Halloran wrote: > When disabling virtual functions on an SR-IOV adapter we currently do not > correctly remove the EEH state for the now-dead virtual functions. When > removing the pci_dn that was created for the VF when SR-IOV was enabled > we free the corresponding eeh_dev without removing it from the child device > list of the eeh_pe that contained it. This can result in crashes due to the > use-after-free. > > Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1fb4124ca9d456656a324f1ee29b7bf942f59ac8 cheers
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index 6556b57..795c4e3 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -244,9 +244,22 @@ void remove_dev_pci_data(struct pci_dev *pdev) continue; #ifdef CONFIG_EEH - /* Release EEH device for the VF */ + /* + * Release EEH state for this VF. The PCI core + * has already torn down the pci_dev for this VF, but + * we're responsible to removing the eeh_dev since it + * has the same lifetime as the pci_dn that spawned it. + */ edev = pdn_to_eeh_dev(pdn); if (edev) { + /* + * We allocate pci_dn's for the totalvfs count, + * but only only the vfs that were activated + * have a configured PE. + */ + if (edev->pe) + eeh_rmv_from_parent_pe(edev); + pdn->edev = NULL; kfree(edev); }
When disabling virtual functions on an SR-IOV adapter we currently do not correctly remove the EEH state for the now-dead virtual functions. When removing the pci_dn that was created for the VF when SR-IOV was enabled we free the corresponding eeh_dev without removing it from the child device list of the eeh_pe that contained it. This can result in crashes due to the use-after-free. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> --- No Fixes: here since I'm not sure if the commit that added this actually introduced the bug. EEH is amazing. I suspect backporting this would cause more problems than it solves since reliably replicating the crash required enabling memory poisoning and hacking a device driver to remove the PCI error handling callbacks so the EEH fallback path (which removes and re-probes PCI devices) would be used. --- arch/powerpc/kernel/pci_dn.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)