[1/6] xive: Check for valid PIR index when decoding

Message ID 20170810065843.13893-1-ruscur@russell.cc
State Accepted
Headers show

Commit Message

Russell Currey Aug. 10, 2017, 6:58 a.m.
This fixes an unlikely but possible assert() fail on kdump.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell Currey <ruscur@russell.cc>
---
 hw/xive.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Andrew Donnellan Aug. 10, 2017, 8:13 a.m. | #1
On 10/08/17 16:58, Russell Currey wrote:
> This fixes an unlikely but possible assert() fail on kdump.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Russell Currey <ruscur@russell.cc>

Where does the assert() happen?

Regardless this looks like a pretty reasonable thing to check, so...

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>


> ---
>   hw/xive.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/hw/xive.c b/hw/xive.c
> index 03b9478e..18636723 100644
> --- a/hw/xive.c
> +++ b/hw/xive.c
> @@ -601,6 +601,8 @@ static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx,
>   
>   	/* PIR case */
>   	if (((vp >> 30) & 1) == 0) {
> +		if (find_cpu_by_pir(index) == NULL)
> +			return false;
>   		if (blk)
>   			*blk = PIR2VP_BLK(index);
>   		if (idx)
> @@ -656,6 +658,8 @@ static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx,
>   
>   	/* PIR case */
>   	if (((vp >> 30) & 1) == 0) {
> +		if (find_cpu_by_pir(index) == NULL)
> +			return false;
>   		if (blk)
>   			*blk = PIR2VP_BLK(index);
>   		if (idx)
>
Andrew Donnellan Aug. 10, 2017, 8:16 a.m. | #2
On 10/08/17 16:58, Russell Currey wrote:
> If a PHB is marked broken it didn't work on boot, and if it didn't work
> on boot then there's no point trying to recover it later

Can't a PHB also end up in BROKEN when a creset fails? Per the error: 
label in phb4_creset()...

> 
> Signed-off-by: Russell Currey <ruscur@russell.cc>

In any case this seems sensible.

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Would phb3 benefit from a similar change?

> ---
>   hw/phb4.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 7e8f68bf..012a8cdc 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -2608,6 +2608,10 @@ static int64_t phb4_creset(struct pci_slot *slot)
>   	struct phb4 *p = phb_to_phb4(slot->phb);
>   	uint64_t pbcq_status, reg;
>   
> +	/* Don't even try fixing a broken PHB */
> +	if (p->state == PHB4_STATE_BROKEN)
> +		return OPAL_HARDWARE;
> +
>   	switch (slot->state) {
>   	case PHB4_SLOT_NORMAL:
>   	case PHB4_SLOT_CRESET_START:
>
Andrew Donnellan Aug. 10, 2017, 8:18 a.m. | #3
On 10/08/17 16:58, Russell Currey wrote:
> phb4_creset() is typically called by functions that prepare the link
> to go down.  In cases where creset() is called directly by the kernel,
> this isn't the case and it can cause issues.  Prepare for link down in
> creset, just like we do in freset and hreset.
> 
> Signed-off-by: Russell Currey <ruscur@russell.cc>

Looks reasonable

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>   hw/phb4.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 012a8cdc..b467e369 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -2623,6 +2623,7 @@ static int64_t phb4_creset(struct pci_slot *slot)
>   			do_capp_recovery_scoms(p);
>   #endif
>   
> +		phb4_prepare_link_change(slot, false);
>   		/* Clear error inject register, preventing recursive errors */
>   		xscom_write(p->chip_id, p->pe_xscom + 0x2, 0x0);
>   
>
Andrew Donnellan Aug. 10, 2017, 8:19 a.m. | #4
On 10/08/17 16:58, Russell Currey wrote:
> If a PHB is being completely reset, its state is about to be blown away
> anyway, so if it's not in an appropriate state, creset it regardless.
> 
> Signed-off-by: Russell Currey <ruscur@russell.cc>

Though hitting this path is probably still a bug somewhere?

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>   hw/phb4.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index b467e369..d13ab8c3 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -2700,8 +2700,11 @@ static int64_t phb4_creset(struct pci_slot *slot)
>   		pci_slot_set_state(slot, PHB4_SLOT_NORMAL);
>   		return slot->ops.freset(slot);
>   	default:
> -		PHBERR(p, "CRESET: Unexpected slot state %08x\n",
> +		PHBERR(p, "CRESET: Unexpected slot state %08x, resetting...\n",
>   		       slot->state);
> +		pci_slot_set_state(slot, PHB4_SLOT_NORMAL);
> +		return slot->ops.creset(slot);
> +
>   	}
>   
>   error:
>
Andrew Donnellan Aug. 10, 2017, 8:22 a.m. | #5
On 10/08/17 16:58, Russell Currey wrote:
> These registers are supposed to be 16bit, and it makes part of the
> register dump misleading.
> 
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
>   hw/phb4.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index d13ab8c3..4c7bfe3b 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -1906,7 +1906,7 @@ static void phb4_read_phb_status(struct phb4 *p,
>   static void phb4_eeh_dump_regs(struct phb4 *p)
>   {
>   	struct OpalIoPhb4ErrorData *s;
> -	uint32_t reg;
> +	uint16_t reg;
>   	unsigned int i;
>   
>   	if (!verbose_eeh)
> @@ -1929,9 +1929,9 @@ static void phb4_eeh_dump_regs(struct phb4 *p)
>   	PHBERR(p, "       uncorrErrorStatus = %08x\n", s->uncorrErrorStatus);
>   
>   	/* Two non OPAL API registers that are useful */
> -	phb4_pcicfg_read32(&p->phb, 0, p->ecap + PCICAP_EXP_DEVCTL, &reg);
> +	phb4_pcicfg_read16(&p->phb, 0, p->ecap + PCICAP_EXP_DEVCTL, &reg);
>   	PHBERR(p, "                  devctl = %08x\n", reg);
> -	phb4_pcicfg_read32(&p->phb, 0, p->ecap + PCICAP_EXP_DEVSTAT,
> +	phb4_pcicfg_read16(&p->phb, 0, p->ecap + PCICAP_EXP_DEVSTAT,
>   			   &reg);
>   	PHBERR(p, "                 devStat = %08x\n", reg);

Do these need to be changed to %04x?
Stewart Smith Aug. 15, 2017, 12:52 a.m. | #6
Russell Currey <ruscur@russell.cc> writes:
> This fixes an unlikely but possible assert() fail on kdump.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Russell Currey <ruscur@russell.cc>

Series merged to master as of 8bdfaa24f88bd8800385a6997bb178dfdd970648

Patch

diff --git a/hw/xive.c b/hw/xive.c
index 03b9478e..18636723 100644
--- a/hw/xive.c
+++ b/hw/xive.c
@@ -601,6 +601,8 @@  static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx,
 
 	/* PIR case */
 	if (((vp >> 30) & 1) == 0) {
+		if (find_cpu_by_pir(index) == NULL)
+			return false;
 		if (blk)
 			*blk = PIR2VP_BLK(index);
 		if (idx)
@@ -656,6 +658,8 @@  static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx,
 
 	/* PIR case */
 	if (((vp >> 30) & 1) == 0) {
+		if (find_cpu_by_pir(index) == NULL)
+			return false;
 		if (blk)
 			*blk = PIR2VP_BLK(index);
 		if (idx)