diff mbox

IBM OpenPower 720 ipr driver woes

Message ID 20130604035223.GA27964@shangw.(null) (mailing list archive)
State Superseded
Delegated to: Benjamin Herrenschmidt
Headers show

Commit Message

Gavin Shan June 4, 2013, 3:52 a.m. UTC
On Tue, Jun 04, 2013 at 01:16:52PM +1000, Tony Breeds wrote:
>On Mon, Jun 03, 2013 at 09:40:52PM -0400, Robert Knight wrote:
>> On 6/3/2013 8:01 PM, Tony Breeds wrote:
>> >On Mon, Jun 03, 2013 at 05:20:12PM -0400, Robert Knight wrote:
>> >
>> >>>Device tree struct  0x0000000004820000 -> 0x0000000004840000
>> >>>Calling quiesce...
>> >>>returning from prom_init
>> >>>[    1.376359] ehci-pci 0000:c8:01.2: can't setup
>> >Can you try adding "debug" to the kernel commandline.  We're missing a
>> >great chunk of detail. If you're starting from scratch either try F18 or
>> >the F19 Beta (if you're brave :))
>> >
>> >Yours Tony
>> So, two points.  Since I have no live disks, I can't copy the dmesg
>> output in dracut onto a disk and just sent it to you -- the only way
>> that I know to get it is cut and paste from a HMC console window
>> into a vi buffer.
>
>Okay, when I've been in that situation, I run "script" locally and then
>conenct to the HMC console.
>
>When you're done you can exit script adn then a file called typescript
>will exist in the directory you ran script in.
>
>It's a less overhead way of doign what you're already doing.
> [    0.087097] NET: Registered protocol family 16
>
>> [    0.087144] pseries_eeh_init: RTAS service <ibm,get-config-addr-info2> and <i
>> bm,get-config-addr-info> invalid
>> [    0.087155] eeh_init: Failed to call platform init function (-22)
>
>Hmm this seems pretty strange to me.  Gavin are these RTAS tokens
>supported on older power5 boxes?
>

Yes, Tony. "ibm,get-config-addr-info" should be supported on Power5 box.
Newer PowerBox (e.g. P7) should support "ibm,get-config-addr-info2"

Please have a try on the attached patch, which is based on mainline (3.10).

Thanks,
Gavin

Comments

Robert Knight June 5, 2013, 9:14 p.m. UTC | #1
On 6/3/2013 11:52 PM, Gavin Shan wrote:
> On Tue, Jun 04, 2013 at 01:16:52PM +1000, Tony Breeds wrote:
>> On Mon, Jun 03, 2013 at 09:40:52PM -0400, Robert Knight wrote:
>>> On 6/3/2013 8:01 PM, Tony Breeds wrote:
>>>> On Mon, Jun 03, 2013 at 05:20:12PM -0400, Robert Knight wrote:
>>>>
>>>>>> Device tree struct  0x0000000004820000 -> 0x0000000004840000
>>>>>> Calling quiesce...
>>>>>> returning from prom_init
>>>>>> [    1.376359] ehci-pci 0000:c8:01.2: can't setup
>>>> Can you try adding "debug" to the kernel commandline.  We're missing a
>>>> great chunk of detail. If you're starting from scratch either try F18 or
>>>> the F19 Beta (if you're brave :))
>>>>
>>>> Yours Tony
>>> So, two points.  Since I have no live disks, I can't copy the dmesg
>>> output in dracut onto a disk and just sent it to you -- the only way
>>> that I know to get it is cut and paste from a HMC console window
>>> into a vi buffer.
>> Okay, when I've been in that situation, I run "script" locally and then
>> conenct to the HMC console.
>>
>> When you're done you can exit script adn then a file called typescript
>> will exist in the directory you ran script in.
>>
>> It's a less overhead way of doign what you're already doing.
>> [    0.087097] NET: Registered protocol family 16
>>
>>> [    0.087144] pseries_eeh_init: RTAS service <ibm,get-config-addr-info2> and <i
>>> bm,get-config-addr-info> invalid
>>> [    0.087155] eeh_init: Failed to call platform init function (-22)
>> Hmm this seems pretty strange to me.  Gavin are these RTAS tokens
>> supported on older power5 boxes?
>>
> Yes, Tony. "ibm,get-config-addr-info" should be supported on Power5 box.
> Newer PowerBox (e.g. P7) should support "ibm,get-config-addr-info2"
>
> Please have a try on the attached patch, which is based on mainline (3.10).
>
> Thanks,
> Gavin
>
>
The system boots with that patch.  I applied it to kernel-3.8.11-100.
Brian King June 6, 2013, 11:32 a.m. UTC | #2
On 06/05/2013 04:14 PM, Robert Knight wrote:
> On 6/3/2013 11:52 PM, Gavin Shan wrote:
>> On Tue, Jun 04, 2013 at 01:16:52PM +1000, Tony Breeds wrote:
>>> On Mon, Jun 03, 2013 at 09:40:52PM -0400, Robert Knight wrote:
>>>> On 6/3/2013 8:01 PM, Tony Breeds wrote:
>>>>> On Mon, Jun 03, 2013 at 05:20:12PM -0400, Robert Knight wrote:
>>>>>
>>>>>>> Device tree struct  0x0000000004820000 -> 0x0000000004840000
>>>>>>> Calling quiesce...
>>>>>>> returning from prom_init
>>>>>>> [    1.376359] ehci-pci 0000:c8:01.2: can't setup
>>>>> Can you try adding "debug" to the kernel commandline.  We're missing a
>>>>> great chunk of detail. If you're starting from scratch either try F18 or
>>>>> the F19 Beta (if you're brave :))
>>>>>
>>>>> Yours Tony
>>>> So, two points.  Since I have no live disks, I can't copy the dmesg
>>>> output in dracut onto a disk and just sent it to you -- the only way
>>>> that I know to get it is cut and paste from a HMC console window
>>>> into a vi buffer.
>>> Okay, when I've been in that situation, I run "script" locally and then
>>> conenct to the HMC console.
>>>
>>> When you're done you can exit script adn then a file called typescript
>>> will exist in the directory you ran script in.
>>>
>>> It's a less overhead way of doign what you're already doing.
>>> [    0.087097] NET: Registered protocol family 16
>>>
>>>> [    0.087144] pseries_eeh_init: RTAS service <ibm,get-config-addr-info2> and <i
>>>> bm,get-config-addr-info> invalid
>>>> [    0.087155] eeh_init: Failed to call platform init function (-22)
>>> Hmm this seems pretty strange to me.  Gavin are these RTAS tokens
>>> supported on older power5 boxes?
>>>
>> Yes, Tony. "ibm,get-config-addr-info" should be supported on Power5 box.
>> Newer PowerBox (e.g. P7) should support "ibm,get-config-addr-info2"
>>
>> Please have a try on the attached patch, which is based on mainline (3.10).
>>
>> Thanks,
>> Gavin
>>
>>
> The system boots with that patch.  I applied it to kernel-3.8.11-100.

Does that patch resolve all your issues, or are there still issues with ipr remaining
after applying the patch?

Thanks,

Brian
Robert Knight June 6, 2013, 12:39 p.m. UTC | #3
On 06/06/2013 07:32 AM, Brian King wrote:
> On 06/05/2013 04:14 PM, Robert Knight wrote:
>> On 6/3/2013 11:52 PM, Gavin Shan wrote:
>>> On Tue, Jun 04, 2013 at 01:16:52PM +1000, Tony Breeds wrote:
>>>> On Mon, Jun 03, 2013 at 09:40:52PM -0400, Robert Knight wrote:
>>>>> On 6/3/2013 8:01 PM, Tony Breeds wrote:
>>>>>> On Mon, Jun 03, 2013 at 05:20:12PM -0400, Robert Knight wrote:
>>>>>>
>>>>>>>> Device tree struct  0x0000000004820000 -> 0x0000000004840000
>>>>>>>> Calling quiesce...
>>>>>>>> returning from prom_init
>>>>>>>> [    1.376359] ehci-pci 0000:c8:01.2: can't setup
>>>>>> Can you try adding "debug" to the kernel commandline.  We're missing a
>>>>>> great chunk of detail. If you're starting from scratch either try F18 or
>>>>>> the F19 Beta (if you're brave :))
>>>>>>
>>>>>> Yours Tony
>>>>> So, two points.  Since I have no live disks, I can't copy the dmesg
>>>>> output in dracut onto a disk and just sent it to you -- the only way
>>>>> that I know to get it is cut and paste from a HMC console window
>>>>> into a vi buffer.
>>>> Okay, when I've been in that situation, I run "script" locally and then
>>>> conenct to the HMC console.
>>>>
>>>> When you're done you can exit script adn then a file called typescript
>>>> will exist in the directory you ran script in.
>>>>
>>>> It's a less overhead way of doign what you're already doing.
>>>> [    0.087097] NET: Registered protocol family 16
>>>>
>>>>> [    0.087144] pseries_eeh_init: RTAS service <ibm,get-config-addr-info2> and <i
>>>>> bm,get-config-addr-info> invalid
>>>>> [    0.087155] eeh_init: Failed to call platform init function (-22)
>>>> Hmm this seems pretty strange to me.  Gavin are these RTAS tokens
>>>> supported on older power5 boxes?
>>>>
>>> Yes, Tony. "ibm,get-config-addr-info" should be supported on Power5 box.
>>> Newer PowerBox (e.g. P7) should support "ibm,get-config-addr-info2"
>>>
>>> Please have a try on the attached patch, which is based on mainline (3.10).
>>>
>>> Thanks,
>>> Gavin
>>>
>>>
>> The system boots with that patch.  I applied it to kernel-3.8.11-100.
> Does that patch resolve all your issues, or are there still issues with ipr remaining
> after applying the patch?
>
> Thanks,
>
> Brian
>
Yes.  I've started rebuilding the kernel and I'm up to the module 
building part, so I'd say it is solid.  Will this patch make it into 
some version of the kernel?

What was killing me was that it would not complete boot.  It now does.  
I see:

[   11.934481] scsi 0:0:15:0: Resetting device
[   11.934813] ipr 0001:d0:01.0: Adapter being reset as a result of 
error recovery.

on each boot.  It does not appear to affect operation.

Thank you and the rest of the team for your rapid and helpful responses.

Best regards,
Robert
Gavin Shan June 7, 2013, 12:24 a.m. UTC | #4
On Thu, Jun 06, 2013 at 08:39:45AM -0400, Robert Knight wrote:
>On 06/06/2013 07:32 AM, Brian King wrote:
>>On 06/05/2013 04:14 PM, Robert Knight wrote:
>>>On 6/3/2013 11:52 PM, Gavin Shan wrote:
>>>>On Tue, Jun 04, 2013 at 01:16:52PM +1000, Tony Breeds wrote:
>>>>>On Mon, Jun 03, 2013 at 09:40:52PM -0400, Robert Knight wrote:
>>>>>>On 6/3/2013 8:01 PM, Tony Breeds wrote:
>>>>>>>On Mon, Jun 03, 2013 at 05:20:12PM -0400, Robert Knight wrote:

.../...

>Yes.  I've started rebuilding the kernel and I'm up to the module
>building part, so I'd say it is solid.  Will this patch make it into
>some version of the kernel?
>

The patch is being pushed to mainline or linux-next, and backported
to stable-kernel (v3.4+)


>What was killing me was that it would not complete boot.  It now
>does.  I see:
>
>[   11.934481] scsi 0:0:15:0: Resetting device
>[   11.934813] ipr 0001:d0:01.0: Adapter being reset as a result of
>error recovery.
>
>on each boot.  It does not appear to affect operation.
>
>Thank you and the rest of the team for your rapid and helpful responses.
>

Thanks,
Gavin
Robert Knight Oct. 17, 2013, 2:57 p.m. UTC | #5
On 06/06/2013 08:24 PM, Gavin Shan wrote:
> On Thu, Jun 06, 2013 at 08:39:45AM -0400, Robert Knight wrote:
>> On 06/06/2013 07:32 AM, Brian King wrote:
>>> On 06/05/2013 04:14 PM, Robert Knight wrote:
>>>> On 6/3/2013 11:52 PM, Gavin Shan wrote:
>>>>> On Tue, Jun 04, 2013 at 01:16:52PM +1000, Tony Breeds wrote:
>>>>>> On Mon, Jun 03, 2013 at 09:40:52PM -0400, Robert Knight wrote:
>>>>>>> On 6/3/2013 8:01 PM, Tony Breeds wrote:
>>>>>>>> On Mon, Jun 03, 2013 at 05:20:12PM -0400, Robert Knight wrote:
> .../...
>
>> Yes.  I've started rebuilding the kernel and I'm up to the module
>> building part, so I'd say it is solid.  Will this patch make it into
>> some version of the kernel?
>>
> The patch is being pushed to mainline or linux-next, and backported
> to stable-kernel (v3.4+)
>
>
>> What was killing me was that it would not complete boot.  It now
>> does.  I see:
>>
>> [   11.934481] scsi 0:0:15:0: Resetting device
>> [   11.934813] ipr 0001:d0:01.0: Adapter being reset as a result of
>> error recovery.
>>
>> on each boot.  It does not appear to affect operation.
>>
>> Thank you and the rest of the team for your rapid and helpful responses.
>>
> Thanks,
> Gavin
>
Well, it's four months later and I'm trying to get Fedora 20 Alpha to 
install on that same machine.  It appears to still have the same 
problem.  Did that patch ever make it into the mainline?

Strangely, the kernel from the installer (using DVD image) does NOT have 
the problem, only the installed system.

Regards,
Robert
Robert Knight Oct. 25, 2013, 9:16 p.m. UTC | #6
On 10/17/2013 10:57 AM, Robert Knight wrote:
> Well, it's four months later and I'm trying to get Fedora 20 Alpha to 
> install on that same machine.  It appears to still have the same 
> problem.  Did that patch ever make it into the mainline?
>
> Strangely, the kernel from the installer (using DVD image) does NOT 
> have the problem, only the installed system.
Just to follow up on this.

The problem does not appear to be the kernel.  Installing on one disk 
succeeds in producing a bootable system every time.  Installing on all 
four disks (two 73 GB disks and two 146 GB disks) never succeeds.

Although there are a storm of ipr driver messages as before, that is 
likely not the problem.

I'm going to continue on #fedora-ppc and the corresponding mailing list.
diff mbox

Patch

From 04771628d53e1e6883063ed21bd6825ee9680366 Mon Sep 17 00:00:00 2001
From: Gavin Shan <shangw@linux.vnet.ibm.com>
Date: Tue, 4 Jun 2013 11:47:59 +0800
Subject: [PATCH] powerpc/eeh: Don't check RTAS token to get PE addr

RTAS token "ibm,get-config-addr-info" or ibm,get-config-addr-info2"
are used to retrieve the PE address according to PCI address, which
made up of domain/bus/slot/function. If we don't have those 2 tokens,
the domain/bus/slot/function would be used as the address for EEH
RTAS operations. Some older f/w might not have those 2 tokens and
that blocks the EEH functionality to be initialized.

The patch skips the check on those 2 tokens so we can bring up EEH
functionality successfully. And domain/bus/slot/function will be
used as address for EEH RTAS operations.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 19506f9..b456b15 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -83,7 +83,11 @@  static int pseries_eeh_init(void)
 	ibm_configure_pe		= rtas_token("ibm,configure-pe");
 	ibm_configure_bridge		= rtas_token("ibm,configure-bridge");
 
-	/* necessary sanity check */
+	/*
+	 * Necessary sanity check. We needn't check "get-config-addr-info"
+	 * and its variant since the old firmware probably support address
+	 * of domain/bus/slot/function for EEH RTAS operations.
+	 */
 	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) {
 		pr_warning("%s: RTAS service <ibm,set-eeh-option> invalid\n",
 			__func__);
@@ -102,12 +106,6 @@  static int pseries_eeh_init(void)
 		pr_warning("%s: RTAS service <ibm,slot-error-detail> invalid\n",
 			__func__);
 		return -EINVAL;
-	} else if (ibm_get_config_addr_info2 == RTAS_UNKNOWN_SERVICE &&
-		   ibm_get_config_addr_info == RTAS_UNKNOWN_SERVICE) {
-		pr_warning("%s: RTAS service <ibm,get-config-addr-info2> and "
-			"<ibm,get-config-addr-info> invalid\n",
-			__func__);
-		return -EINVAL;
 	} else if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE &&
 		   ibm_configure_bridge == RTAS_UNKNOWN_SERVICE) {
 		pr_warning("%s: RTAS service <ibm,configure-pe> and "
-- 
1.7.5.4