diff mbox

[Wily/Xenial,SRU] UBUNTU: SAUCE: powerpc/eeh: Validate arch in eeh_add_device_early()

Message ID 1452532306-31766-1-git-send-email-tim.gardner@canonical.com
State New
Headers show

Commit Message

Tim Gardner Jan. 11, 2016, 5:11 p.m. UTC
From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>

BugLink: http://bugs.launchpad.net/bugs/1486180

Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
is NULL, that code used to crash on Cell. The commit's approach was validate
if EEH was available by checking the result of function eeh_enabled().

Since the function eeh_add_device_early() is used to perform EEH
initialization in devices added later on the system, like in hotplug/DLPAR
scenarios, we might reach a case in which no PCI devices are present on boot
and so EEH is not initialized. Then, if a device is added via DLPAR for
example, eeh_add_device_early() fails because eeh_enabled() is false.

We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
if we have no PCI devices on machine at boot time, and then we add a PCI device
via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
dereference in the line "cfg_addr = edev->config_addr;". It happens because
config_addr in edev is NULL, since the function eeh_add_device_early() was not
completed successfully.

This patch just changes the way the arch checking is done in function
eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
running architecture by using the macro machine_is(). If we are running on
pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
the function. This way, we don't enable EEH on Cell and we don't hit the oops
on DLPAR either.

Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 arch/powerpc/kernel/eeh.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Chris J Arges Jan. 11, 2016, 7:42 p.m. UTC | #1
On 01/11/2016 11:11 AM, tim.gardner@canonical.com wrote:
> From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1486180
> 
> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
> is NULL, that code used to crash on Cell. The commit's approach was validate
> if EEH was available by checking the result of function eeh_enabled().
> 
> Since the function eeh_add_device_early() is used to perform EEH
> initialization in devices added later on the system, like in hotplug/DLPAR
> scenarios, we might reach a case in which no PCI devices are present on boot
> and so EEH is not initialized. Then, if a device is added via DLPAR for
> example, eeh_add_device_early() fails because eeh_enabled() is false.
> 
> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
> if we have no PCI devices on machine at boot time, and then we add a PCI device
> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
> dereference in the line "cfg_addr = edev->config_addr;". It happens because
> config_addr in edev is NULL, since the function eeh_add_device_early() was not
> completed successfully.
> 
> This patch just changes the way the arch checking is done in function
> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
> running architecture by using the macro machine_is(). If we are running on
> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
> the function. This way, we don't enable EEH on Cell and we don't hit the oops
> on DLPAR either.
> 
> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> ---
>  arch/powerpc/kernel/eeh.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 01c961d..3203895 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1071,7 +1071,13 @@ void eeh_add_device_early(struct pci_dn *pdn)
>  	struct pci_controller *phb;
>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>  
> -	if (!edev || !eeh_enabled())
> +	if (!edev)
> +		return;
> +
> +	/* Some platforms (like Cell) don't have EEH capabilities, so we
> +	 * need to abort here. In case of pseries or powernv, we have EEH
> +	 * so we can continue. */
> +	if (!machine_is(pseries) && !machine_is(powernv))
>  		return;
>  
>  	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
>
Stefan Bader Jan. 12, 2016, 9:58 a.m. UTC | #2
On 11.01.2016 18:11, tim.gardner@canonical.com wrote:
> From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1486180
> 
> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
> is NULL, that code used to crash on Cell. The commit's approach was validate
> if EEH was available by checking the result of function eeh_enabled().
> 
> Since the function eeh_add_device_early() is used to perform EEH
> initialization in devices added later on the system, like in hotplug/DLPAR
> scenarios, we might reach a case in which no PCI devices are present on boot
> and so EEH is not initialized. Then, if a device is added via DLPAR for
> example, eeh_add_device_early() fails because eeh_enabled() is false.
> 
> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
> if we have no PCI devices on machine at boot time, and then we add a PCI device
> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
> dereference in the line "cfg_addr = edev->config_addr;". It happens because
> config_addr in edev is NULL, since the function eeh_add_device_early() was not
> completed successfully.
> 
> This patch just changes the way the arch checking is done in function
> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
> running architecture by using the macro machine_is(). If we are running on
> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
> the function. This way, we don't enable EEH on Cell and we don't hit the oops
> on DLPAR either.
> 
> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

cherry-picked from? Otherwise looks consistent with description.

> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> ---
>  arch/powerpc/kernel/eeh.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 01c961d..3203895 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1071,7 +1071,13 @@ void eeh_add_device_early(struct pci_dn *pdn)
>  	struct pci_controller *phb;
>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>  
> -	if (!edev || !eeh_enabled())
> +	if (!edev)
> +		return;
> +
> +	/* Some platforms (like Cell) don't have EEH capabilities, so we
> +	 * need to abort here. In case of pseries or powernv, we have EEH
> +	 * so we can continue. */
> +	if (!machine_is(pseries) && !machine_is(powernv))
>  		return;
>  
>  	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
>
Tim Gardner Jan. 12, 2016, 1:07 p.m. UTC | #3
On 01/12/2016 02:58 AM, Stefan Bader wrote:
> On 11.01.2016 18:11, tim.gardner@canonical.com wrote:
>> From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
>>
>> BugLink: http://bugs.launchpad.net/bugs/1486180
>>
>> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
>> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
>> is NULL, that code used to crash on Cell. The commit's approach was validate
>> if EEH was available by checking the result of function eeh_enabled().
>>
>> Since the function eeh_add_device_early() is used to perform EEH
>> initialization in devices added later on the system, like in hotplug/DLPAR
>> scenarios, we might reach a case in which no PCI devices are present on boot
>> and so EEH is not initialized. Then, if a device is added via DLPAR for
>> example, eeh_add_device_early() fails because eeh_enabled() is false.
>>
>> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
>> if we have no PCI devices on machine at boot time, and then we add a PCI device
>> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
>> dereference in the line "cfg_addr = edev->config_addr;". It happens because
>> config_addr in edev is NULL, since the function eeh_add_device_early() was not
>> completed successfully.
>>
>> This patch just changes the way the arch checking is done in function
>> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
>> running architecture by using the macro machine_is(). If we are running on
>> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
>> the function. This way, we don't enable EEH on Cell and we don't hit the oops
>> on DLPAR either.
>>
>> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>
> cherry-picked from? Otherwise looks consistent with description.
>

No cherry-pick. It comes right from the bug report (and is on the 
powerpc-dev list). IBM thought it was important to fix the OOPS. It'll 
make its way upstream eventually I suppose.

rtg
Stefan Bader Jan. 12, 2016, 2:26 p.m. UTC | #4
On 12.01.2016 14:07, Tim Gardner wrote:
> On 01/12/2016 02:58 AM, Stefan Bader wrote:
>> On 11.01.2016 18:11, tim.gardner@canonical.com wrote:
>>> From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
>>>
>>> BugLink: http://bugs.launchpad.net/bugs/1486180
>>>
>>> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
>>> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
>>> is NULL, that code used to crash on Cell. The commit's approach was validate
>>> if EEH was available by checking the result of function eeh_enabled().
>>>
>>> Since the function eeh_add_device_early() is used to perform EEH
>>> initialization in devices added later on the system, like in hotplug/DLPAR
>>> scenarios, we might reach a case in which no PCI devices are present on boot
>>> and so EEH is not initialized. Then, if a device is added via DLPAR for
>>> example, eeh_add_device_early() fails because eeh_enabled() is false.
>>>
>>> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
>>> if we have no PCI devices on machine at boot time, and then we add a PCI device
>>> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
>>> dereference in the line "cfg_addr = edev->config_addr;". It happens because
>>> config_addr in edev is NULL, since the function eeh_add_device_early() was not
>>> completed successfully.
>>>
>>> This patch just changes the way the arch checking is done in function
>>> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
>>> running architecture by using the macro machine_is(). If we are running on
>>> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
>>> the function. This way, we don't enable EEH on Cell and we don't hit the oops
>>> on DLPAR either.
>>>
>>> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
>>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>>
>> cherry-picked from? Otherwise looks consistent with description.
>>
> 
> No cherry-pick. It comes right from the bug report (and is on the powerpc-dev
> list). IBM thought it was important to fix the OOPS. It'll make its way upstream
> eventually I suppose.

Doh! Right, that is why it is SAUCE... a fact which must have blanked completely
this morning. Maybe an artificial "picked from LP bug" or a link like Kamal did
in his recent SAUCE submission would be helpful. Though its hard to get through
missing coffee layers...

-Stefan

> 
> rtg
Kamal Mostafa Jan. 12, 2016, 5:55 p.m. UTC | #5

diff mbox

Patch

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 01c961d..3203895 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1071,7 +1071,13 @@  void eeh_add_device_early(struct pci_dn *pdn)
 	struct pci_controller *phb;
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
 
-	if (!edev || !eeh_enabled())
+	if (!edev)
+		return;
+
+	/* Some platforms (like Cell) don't have EEH capabilities, so we
+	 * need to abort here. In case of pseries or powernv, we have EEH
+	 * so we can continue. */
+	if (!machine_is(pseries) && !machine_is(powernv))
 		return;
 
 	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))