diff mbox series

[v2] phb4: Reset pfir and nfir if new errors reported during ETU reset

Message ID 20180917072550.4255-1-vaibhav@linux.ibm.com
State Accepted
Headers show
Series [v2] phb4: Reset pfir and nfir if new errors reported during ETU reset | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check success Test make_check on branch master

Commit Message

Vaibhav Jain Sept. 17, 2018, 7:25 a.m. UTC
During fast-reboot new PEC errors can be latched even after ETU-Reset
is asserted. This will result in values of variables nfir_cache and
pfir_cache to be out of sync.

During step-2 of CRESET nfir_cache and pfir_cache values are used to
bring the PHB out of reset state. However if these variables are out
as noted above of date the nfir/pfir registers are never reset
completely and ETU still remains frozen.

Hence this patch updates step-2 of phb4_creset to re-read the values of
nfir/pfir registers to check if any new errors were reported after
ETU-reset was asserted, report these new errors and reset the
nfir/pfir registers. This should bring the ETU out of reset
successfully.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Change-log:
v2	-> Rebased the patch to
	http://patchwork.ozlabs.org/patch/970408/ to dump all pec
	error registers.
---
 hw/phb4.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Vasant Hegde Sept. 18, 2018, 10:32 a.m. UTC | #1
On 09/17/2018 12:55 PM, Vaibhav Jain wrote:
> During fast-reboot new PEC errors can be latched even after ETU-Reset
> is asserted. This will result in values of variables nfir_cache and
> pfir_cache to be out of sync.
> 
> During step-2 of CRESET nfir_cache and pfir_cache values are used to
> bring the PHB out of reset state. However if these variables are out
> as noted above of date the nfir/pfir registers are never reset
> completely and ETU still remains frozen.
> 
> Hence this patch updates step-2 of phb4_creset to re-read the values of
> nfir/pfir registers to check if any new errors were reported after
> ETU-reset was asserted, report these new errors and reset the
> nfir/pfir registers. This should bring the ETU out of reset
> successfully.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

Tested-By: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

-Vasant
Oliver O'Halloran Sept. 19, 2018, 5:22 a.m. UTC | #2
On Mon, Sep 17, 2018 at 5:25 PM, Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
> During fast-reboot new PEC errors can be latched even after ETU-Reset
> is asserted. This will result in values of variables nfir_cache and
> pfir_cache to be out of sync.
>
> During step-2 of CRESET nfir_cache and pfir_cache values are used to
> bring the PHB out of reset state. However if these variables are out
> as noted above of date the nfir/pfir registers are never reset
> completely and ETU still remains frozen.
>
> Hence this patch updates step-2 of phb4_creset to re-read the values of
> nfir/pfir registers to check if any new errors were reported after
> ETU-reset was asserted, report these new errors and reset the
> nfir/pfir registers. This should bring the ETU out of reset
> successfully.
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Change-log:
> v2      -> Rebased the patch to
>         http://patchwork.ozlabs.org/patch/970408/ to dump all pec
>         error registers.

looks good to me

Reviewed-by: Oliver O'Halloran <oohall@gmail.com>

> ---
>  hw/phb4.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index cf3d0f84..3b1a755c 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -3160,6 +3160,25 @@ static int64_t phb4_creset(struct pci_slot *slot)
>                         xscom_write(p->chip_id, p->pe_stk_xscom + 0x1,
>                                     ~p->nfir_cache);
>
> +                       /* Re-read errors in PFIR and NFIR and reset any new
> +                        * error reported.
> +                        */
> +                       xscom_read(p->chip_id, p->pci_stk_xscom +
> +                                  XPEC_PCI_STK_PCI_FIR, &p->pfir_cache);
> +                       xscom_read(p->chip_id, p->pe_stk_xscom +
> +                                  XPEC_NEST_STK_PCI_NFIR, &p->nfir_cache);
> +
> +                       if (p->pfir_cache || p->nfir_cache) {
> +                               PHBERR(p, "CRESET: PHB still fenced !!\n");
> +                               phb4_dump_pec_err_regs(p);
> +
> +                               /* Reset the PHB errors */
> +                               xscom_write(p->chip_id, p->pci_stk_xscom +
> +                                           XPEC_PCI_STK_PCI_FIR, 0);
> +                               xscom_write(p->chip_id, p->pe_stk_xscom +
> +                                           XPEC_NEST_STK_PCI_NFIR, 0);
> +                       }
> +
>                         /* Clear PHB from reset */
>                         xscom_write(p->chip_id,
>                                     p->pci_stk_xscom + XPEC_PCI_STK_ETU_RESET, 0x0);
> --
> 2.17.1
>
Stewart Smith Sept. 20, 2018, 5:58 a.m. UTC | #3
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> During fast-reboot new PEC errors can be latched even after ETU-Reset
> is asserted. This will result in values of variables nfir_cache and
> pfir_cache to be out of sync.
>
> During step-2 of CRESET nfir_cache and pfir_cache values are used to
> bring the PHB out of reset state. However if these variables are out
> as noted above of date the nfir/pfir registers are never reset
> completely and ETU still remains frozen.
>
> Hence this patch updates step-2 of phb4_creset to re-read the values of
> nfir/pfir registers to check if any new errors were reported after
> ETU-reset was asserted, report these new errors and reset the
> nfir/pfir registers. This should bring the ETU out of reset
> successfully.
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

Thanks, merged to master as of c7be508115db0c903f7054c68beb0349b09592e0
diff mbox series

Patch

diff --git a/hw/phb4.c b/hw/phb4.c
index cf3d0f84..3b1a755c 100644
--- a/hw/phb4.c
+++ b/hw/phb4.c
@@ -3160,6 +3160,25 @@  static int64_t phb4_creset(struct pci_slot *slot)
 			xscom_write(p->chip_id, p->pe_stk_xscom + 0x1,
 				    ~p->nfir_cache);
 
+			/* Re-read errors in PFIR and NFIR and reset any new
+			 * error reported.
+			 */
+			xscom_read(p->chip_id, p->pci_stk_xscom +
+				   XPEC_PCI_STK_PCI_FIR, &p->pfir_cache);
+			xscom_read(p->chip_id, p->pe_stk_xscom +
+				   XPEC_NEST_STK_PCI_NFIR, &p->nfir_cache);
+
+			if (p->pfir_cache || p->nfir_cache) {
+				PHBERR(p, "CRESET: PHB still fenced !!\n");
+				phb4_dump_pec_err_regs(p);
+
+				/* Reset the PHB errors */
+				xscom_write(p->chip_id, p->pci_stk_xscom +
+					    XPEC_PCI_STK_PCI_FIR, 0);
+				xscom_write(p->chip_id, p->pe_stk_xscom +
+					    XPEC_NEST_STK_PCI_NFIR, 0);
+			}
+
 			/* Clear PHB from reset */
 			xscom_write(p->chip_id,
 				    p->pci_stk_xscom + XPEC_PCI_STK_ETU_RESET, 0x0);