[RFC,2/2] phb4: Use ASB for read/writes to PHB registers on resets

Message ID 20171002222301.22832-3-gpiccoli@linux.vnet.ibm.com
State New
Headers show
Series
  • Some changes in phb4
Related show

Commit Message

Guilherme G. Piccoli Oct. 2, 2017, 10:23 p.m.
If we are performing a PHB reset and coming for a bad situation,
(and PHB is fenced), we fail because we don't use ASB by default
in the phb4_init_hw().

This patch makes use of this trick when we are resetting PHBs
to workaround possible problems in PHB (aka it might be fenced)
and at least try to perform the complete initialization.

This proved useful in kdump kernel boot on POWER9 (wspoon).

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---

Tested on wspoon DD 2.01, worked fine. But...after kdump is complete,
system does not reboot automatically. I'll investigate too, could be
another problem or (probably) something due to this change.

I thought in sending it anyway, as-is, in order to get feedback
and stop my investigation in this direction if it does not worth.
Thanks in advance!

 hw/phb4.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Michael Neuling Oct. 3, 2017, 12:38 a.m. | #1
On Mon, 2017-10-02 at 19:23 -0300, Guilherme G. Piccoli wrote:
> If we are performing a PHB reset and coming for a bad situation,
> (and PHB is fenced), we fail because we don't use ASB by default
> in the phb4_init_hw().
> 
> This patch makes use of this trick when we are resetting PHBs
> to workaround possible problems in PHB (aka it might be fenced)
> and at least try to perform the complete initialization.
> 
> This proved useful in kdump kernel boot on POWER9 (wspoon).
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
> 
> Tested on wspoon DD 2.01, worked fine. But...after kdump is complete,
> system does not reboot automatically. I'll investigate too, could be
> another problem or (probably) something due to this change.
> 
> I thought in sending it anyway, as-is, in order to get feedback
> and stop my investigation in this direction if it does not worth.
> Thanks in advance!
> 
>  hw/phb4.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 99902d3a..12312a5c 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -4312,6 +4312,10 @@ static void phb4_init_hw(struct phb4 *p, bool
> first_init)
>  
>  	PHBDBG(p, "Initializing PHB4...\n");
>  
> +	/* Use ASB to try initialize even if the PHB is fenced. */
> +	if (!first_init)
> +		p->flags |= PHB4_CFG_USE_ASB;
> +

I'm not sure I'm keen on this solution.  We have this code in the creset path
which is calling phb4_init_hw()

	case PHB4_SLOT_CRESET_REINIT:
		PHBDBG(p, "CRESET: Reinitialization\n");
		p->flags &= ~PHB4_AIB_FENCED;
		p->flags &= ~PHB4_CAPP_RECOVERY;
		p->flags &= ~PHB4_CFG_USE_ASB;
		phb4_init_hw(p, false);
		pci_slot_set_state(slot, PHB4_SLOT_CRESET_FRESET);
		return pci_slot_set_sm_timeout(slot, msecs_to_tb(100));

It seems we are clearing it, and then you are setting it immediately after. 
Seems like the initial clearing would be wrong and we're just papering over the
issue.

Mikey


>  	/* Init_1 - Sync reset
>  	 *
>  	 * At this point we assume the PHB has already been reset.
> @@ -4387,6 +4391,9 @@ static void phb4_init_hw(struct phb4 *p, bool
> first_init)
>  
>  	phb4_write_reg(p, PHB_CTRLR, val);
>  
> +        if (!first_init)
> +                p->flags &= ~PHB4_CFG_USE_ASB;
> +
>  	/* Init_18..41 - Architected IODA3 inits */
>  	phb4_init_ioda3(p);
>
Guilherme G. Piccoli Oct. 3, 2017, 2:28 a.m. | #2
On 10/02/2017 09:38 PM, Michael Neuling wrote:
> [...] 
> I'm not sure I'm keen on this solution.  We have this code in the creset path
> which is calling phb4_init_hw()
> 
> 	case PHB4_SLOT_CRESET_REINIT:
> 		PHBDBG(p, "CRESET: Reinitialization\n");
> 		p->flags &= ~PHB4_AIB_FENCED;
> 		p->flags &= ~PHB4_CAPP_RECOVERY;
> 		p->flags &= ~PHB4_CFG_USE_ASB;
> 		phb4_init_hw(p, false);
> 		pci_slot_set_state(slot, PHB4_SLOT_CRESET_FRESET);
> 		return pci_slot_set_sm_timeout(slot, msecs_to_tb(100));
> 
> It seems we are clearing it, and then you are setting it immediately after. 
> Seems like the initial clearing would be wrong and we're just papering over the
> issue.
> 

Thanks for the review Mikey! I saw this code, and you're right - we
might remove it! But even if we don't clear the ASB flag...it does not
mean we'll have it set, correct?

So, perhaps instead of clearing we should _set it_ on the
switch..tomorrow I'll do more experiments on this, since it seems a
valid path of investigation.

Cheers,


Guilherme

> Mikey
> 
> 
>>  	/* Init_1 - Sync reset
>>  	 *
>>  	 * At this point we assume the PHB has already been reset.
>> @@ -4387,6 +4391,9 @@ static void phb4_init_hw(struct phb4 *p, bool
>> first_init)
>>  
>>  	phb4_write_reg(p, PHB_CTRLR, val);
>>  
>> +        if (!first_init)
>> +                p->flags &= ~PHB4_CFG_USE_ASB;
>> +
>>  	/* Init_18..41 - Architected IODA3 inits */
>>  	phb4_init_ioda3(p);
>>
Benjamin Herrenschmidt Oct. 3, 2017, 1:50 p.m. | #3
On Mon, 2017-10-02 at 19:23 -0300, Guilherme G. Piccoli wrote:
> If we are performing a PHB reset and coming for a bad situation,
> (and PHB is fenced), we fail because we don't use ASB by default
> in the phb4_init_hw().
> 
> This patch makes use of this trick when we are resetting PHBs
> to workaround possible problems in PHB (aka it might be fenced)
> and at least try to perform the complete initialization.
> 
> This proved useful in kdump kernel boot on POWER9 (wspoon).

We should never be calling init_hw while fenced. If we do then the bug
is elsewhere. The reset code should be checking for fences already and
clearing them if needed.
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
> 
> Tested on wspoon DD 2.01, worked fine. But...after kdump is complete,
> system does not reboot automatically. I'll investigate too, could be
> another problem or (probably) something due to this change.
> 
> I thought in sending it anyway, as-is, in order to get feedback
> and stop my investigation in this direction if it does not worth.
> Thanks in advance!
> 
>  hw/phb4.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 99902d3a..12312a5c 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -4312,6 +4312,10 @@ static void phb4_init_hw(struct phb4 *p, bool first_init)
>  
>  	PHBDBG(p, "Initializing PHB4...\n");
>  
> +	/* Use ASB to try initialize even if the PHB is fenced. */
> +	if (!first_init)
> +		p->flags |= PHB4_CFG_USE_ASB;
> +
>  	/* Init_1 - Sync reset
>  	 *
>  	 * At this point we assume the PHB has already been reset.
> @@ -4387,6 +4391,9 @@ static void phb4_init_hw(struct phb4 *p, bool first_init)
>  
>  	phb4_write_reg(p, PHB_CTRLR, val);
>  
> +        if (!first_init)
> +                p->flags &= ~PHB4_CFG_USE_ASB;
> +
>  	/* Init_18..41 - Architected IODA3 inits */
>  	phb4_init_ioda3(p);
>
Benjamin Herrenschmidt Oct. 3, 2017, 1:53 p.m. | #4
On Tue, 2017-10-03 at 11:38 +1100, Michael Neuling wrote:
> I'm not sure I'm keen on this solution.  We have this code in the creset path
> which is calling phb4_init_hw()
> 
>         case PHB4_SLOT_CRESET_REINIT:
>                 PHBDBG(p, "CRESET: Reinitialization\n");
>                 p->flags &= ~PHB4_AIB_FENCED;
>                 p->flags &= ~PHB4_CAPP_RECOVERY;
>                 p->flags &= ~PHB4_CFG_USE_ASB;
>                 phb4_init_hw(p, false);
>                 pci_slot_set_state(slot, PHB4_SLOT_CRESET_FRESET);
>                 return pci_slot_set_sm_timeout(slot, msecs_to_tb(100));
> 
> It seems we are clearing it, and then you are setting it immediately after. 
> Seems like the initial clearing would be wrong and we're just papering over the
> issue.

No. We should only enter that state after we have successfully cleared
the fence already by resetting the logic.

If the fence is still up, we have a problem but all I see is papering
over it, not understanding and fixing it properly.

We could add a proper fence check after clearing the flags and before
doing a reinit, and maybe trying to go back to PHB4_SLOT_CRESET_START
if we git (with a retry count) but it would be useful to understand
how come we end up there fenced.

You might need to work with Ric on sorting out the sequence here.

Ben.
Guilherme G. Piccoli Oct. 3, 2017, 2:04 p.m. | #5
On 10/03/2017 10:50 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-10-02 at 19:23 -0300, Guilherme G. Piccoli wrote:
>> If we are performing a PHB reset and coming for a bad situation,
>> (and PHB is fenced), we fail because we don't use ASB by default
>> in the phb4_init_hw().
>>
>> This patch makes use of this trick when we are resetting PHBs
>> to workaround possible problems in PHB (aka it might be fenced)
>> and at least try to perform the complete initialization.
>>
>> This proved useful in kdump kernel boot on POWER9 (wspoon).
> 
> We should never be calling init_hw while fenced. If we do then the bug
> is elsewhere. The reset code should be checking for fences already and
> clearing them if needed.

Thanks for your good review, I'll forget about this path, since it's
just "hiding" the issue, and follow your suggestions!

Thanks,


Guilherme

Patch

diff --git a/hw/phb4.c b/hw/phb4.c
index 99902d3a..12312a5c 100644
--- a/hw/phb4.c
+++ b/hw/phb4.c
@@ -4312,6 +4312,10 @@  static void phb4_init_hw(struct phb4 *p, bool first_init)
 
 	PHBDBG(p, "Initializing PHB4...\n");
 
+	/* Use ASB to try initialize even if the PHB is fenced. */
+	if (!first_init)
+		p->flags |= PHB4_CFG_USE_ASB;
+
 	/* Init_1 - Sync reset
 	 *
 	 * At this point we assume the PHB has already been reset.
@@ -4387,6 +4391,9 @@  static void phb4_init_hw(struct phb4 *p, bool first_init)
 
 	phb4_write_reg(p, PHB_CTRLR, val);
 
+        if (!first_init)
+                p->flags &= ~PHB4_CFG_USE_ASB;
+
 	/* Init_18..41 - Architected IODA3 inits */
 	phb4_init_ioda3(p);