Message ID | 1452532306-31766-1-git-send-email-tim.gardner@canonical.com |
---|---|
State | New |
Headers | show |
On 01/11/2016 11:11 AM, tim.gardner@canonical.com wrote: > From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com> > > BugLink: http://bugs.launchpad.net/bugs/1486180 > > Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") > added a check on function eeh_add_device_early(): since in Cell arch eeh_ops > is NULL, that code used to crash on Cell. The commit's approach was validate > if EEH was available by checking the result of function eeh_enabled(). > > Since the function eeh_add_device_early() is used to perform EEH > initialization in devices added later on the system, like in hotplug/DLPAR > scenarios, we might reach a case in which no PCI devices are present on boot > and so EEH is not initialized. Then, if a device is added via DLPAR for > example, eeh_add_device_early() fails because eeh_enabled() is false. > > We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails: > if we have no PCI devices on machine at boot time, and then we add a PCI device > via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer > dereference in the line "cfg_addr = edev->config_addr;". It happens because > config_addr in edev is NULL, since the function eeh_add_device_early() was not > completed successfully. > > This patch just changes the way the arch checking is done in function > eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the > running architecture by using the macro machine_is(). If we are running on > pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out > the function. This way, we don't enable EEH on Cell and we don't hit the oops > on DLPAR either. > > Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") > Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> > Signed-off-by: Tim Gardner <tim.gardner@canonical.com> > --- > arch/powerpc/kernel/eeh.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > index 01c961d..3203895 100644 > --- a/arch/powerpc/kernel/eeh.c > +++ b/arch/powerpc/kernel/eeh.c > @@ -1071,7 +1071,13 @@ void eeh_add_device_early(struct pci_dn *pdn) > struct pci_controller *phb; > struct eeh_dev *edev = pdn_to_eeh_dev(pdn); > > - if (!edev || !eeh_enabled()) > + if (!edev) > + return; > + > + /* Some platforms (like Cell) don't have EEH capabilities, so we > + * need to abort here. In case of pseries or powernv, we have EEH > + * so we can continue. */ > + if (!machine_is(pseries) && !machine_is(powernv)) > return; > > if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE)) >
On 11.01.2016 18:11, tim.gardner@canonical.com wrote: > From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com> > > BugLink: http://bugs.launchpad.net/bugs/1486180 > > Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") > added a check on function eeh_add_device_early(): since in Cell arch eeh_ops > is NULL, that code used to crash on Cell. The commit's approach was validate > if EEH was available by checking the result of function eeh_enabled(). > > Since the function eeh_add_device_early() is used to perform EEH > initialization in devices added later on the system, like in hotplug/DLPAR > scenarios, we might reach a case in which no PCI devices are present on boot > and so EEH is not initialized. Then, if a device is added via DLPAR for > example, eeh_add_device_early() fails because eeh_enabled() is false. > > We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails: > if we have no PCI devices on machine at boot time, and then we add a PCI device > via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer > dereference in the line "cfg_addr = edev->config_addr;". It happens because > config_addr in edev is NULL, since the function eeh_add_device_early() was not > completed successfully. > > This patch just changes the way the arch checking is done in function > eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the > running architecture by using the macro machine_is(). If we are running on > pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out > the function. This way, we don't enable EEH on Cell and we don't hit the oops > on DLPAR either. > > Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") > Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> cherry-picked from? Otherwise looks consistent with description. > Signed-off-by: Tim Gardner <tim.gardner@canonical.com> > --- > arch/powerpc/kernel/eeh.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > index 01c961d..3203895 100644 > --- a/arch/powerpc/kernel/eeh.c > +++ b/arch/powerpc/kernel/eeh.c > @@ -1071,7 +1071,13 @@ void eeh_add_device_early(struct pci_dn *pdn) > struct pci_controller *phb; > struct eeh_dev *edev = pdn_to_eeh_dev(pdn); > > - if (!edev || !eeh_enabled()) > + if (!edev) > + return; > + > + /* Some platforms (like Cell) don't have EEH capabilities, so we > + * need to abort here. In case of pseries or powernv, we have EEH > + * so we can continue. */ > + if (!machine_is(pseries) && !machine_is(powernv)) > return; > > if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE)) >
On 01/12/2016 02:58 AM, Stefan Bader wrote: > On 11.01.2016 18:11, tim.gardner@canonical.com wrote: >> From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com> >> >> BugLink: http://bugs.launchpad.net/bugs/1486180 >> >> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") >> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops >> is NULL, that code used to crash on Cell. The commit's approach was validate >> if EEH was available by checking the result of function eeh_enabled(). >> >> Since the function eeh_add_device_early() is used to perform EEH >> initialization in devices added later on the system, like in hotplug/DLPAR >> scenarios, we might reach a case in which no PCI devices are present on boot >> and so EEH is not initialized. Then, if a device is added via DLPAR for >> example, eeh_add_device_early() fails because eeh_enabled() is false. >> >> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails: >> if we have no PCI devices on machine at boot time, and then we add a PCI device >> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer >> dereference in the line "cfg_addr = edev->config_addr;". It happens because >> config_addr in edev is NULL, since the function eeh_add_device_early() was not >> completed successfully. >> >> This patch just changes the way the arch checking is done in function >> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the >> running architecture by using the macro machine_is(). If we are running on >> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out >> the function. This way, we don't enable EEH on Cell and we don't hit the oops >> on DLPAR either. >> >> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> > > cherry-picked from? Otherwise looks consistent with description. > No cherry-pick. It comes right from the bug report (and is on the powerpc-dev list). IBM thought it was important to fix the OOPS. It'll make its way upstream eventually I suppose. rtg
On 12.01.2016 14:07, Tim Gardner wrote: > On 01/12/2016 02:58 AM, Stefan Bader wrote: >> On 11.01.2016 18:11, tim.gardner@canonical.com wrote: >>> From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com> >>> >>> BugLink: http://bugs.launchpad.net/bugs/1486180 >>> >>> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") >>> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops >>> is NULL, that code used to crash on Cell. The commit's approach was validate >>> if EEH was available by checking the result of function eeh_enabled(). >>> >>> Since the function eeh_add_device_early() is used to perform EEH >>> initialization in devices added later on the system, like in hotplug/DLPAR >>> scenarios, we might reach a case in which no PCI devices are present on boot >>> and so EEH is not initialized. Then, if a device is added via DLPAR for >>> example, eeh_add_device_early() fails because eeh_enabled() is false. >>> >>> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails: >>> if we have no PCI devices on machine at boot time, and then we add a PCI device >>> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer >>> dereference in the line "cfg_addr = edev->config_addr;". It happens because >>> config_addr in edev is NULL, since the function eeh_add_device_early() was not >>> completed successfully. >>> >>> This patch just changes the way the arch checking is done in function >>> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the >>> running architecture by using the macro machine_is(). If we are running on >>> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out >>> the function. This way, we don't enable EEH on Cell and we don't hit the oops >>> on DLPAR either. >>> >>> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") >>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> >> >> cherry-picked from? Otherwise looks consistent with description. >> > > No cherry-pick. It comes right from the bug report (and is on the powerpc-dev > list). IBM thought it was important to fix the OOPS. It'll make its way upstream > eventually I suppose. Doh! Right, that is why it is SAUCE... a fact which must have blanked completely this morning. Maybe an artificial "picked from LP bug" or a link like Kamal did in his recent SAUCE submission would be helpful. Though its hard to get through missing coffee layers... -Stefan > > rtg
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 01c961d..3203895 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1071,7 +1071,13 @@ void eeh_add_device_early(struct pci_dn *pdn) struct pci_controller *phb; struct eeh_dev *edev = pdn_to_eeh_dev(pdn); - if (!edev || !eeh_enabled()) + if (!edev) + return; + + /* Some platforms (like Cell) don't have EEH capabilities, so we + * need to abort here. In case of pseries or powernv, we have EEH + * so we can continue. */ + if (!machine_is(pseries) && !machine_is(powernv)) return; if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))