Message ID | 20181205235013.68364-1-aik@ozlabs.ru |
---|---|
State | Accepted |
Headers | show |
Series | npu2: Return sensible PCI error when not frozen | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | master/apply_patch Successfully applied |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot | success | Test snowpatch/job/snowpatch-skiboot on branch master |
On 6/12/18 10:50 am, Alexey Kardashevskiy wrote: > The current kernel calls OPAL_PCI_EEH_FREEZE_STATUS with an uninitialized > @pci_error_type parameter and then analyzes it even if the OPAL call > returned OPAL_SUCCESS. This is results in unexpected EEH events and NPU > freezes. > > This initializes @pci_error_type and @severity to known safe values. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > --- > > The corresponding kernel patch is under review: > https://patchwork.ozlabs.org/patch/999630/ > --- > hw/npu2.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/hw/npu2.c b/hw/npu2.c > index 767306f..674ffee 100644 > --- a/hw/npu2.c > +++ b/hw/npu2.c > @@ -1313,8 +1313,8 @@ static struct pci_slot *npu2_slot_create(struct phb *phb) > int64_t npu2_freeze_status(struct phb *phb __unused, > uint64_t pe_number __unused, > uint8_t *freeze_state, > - uint16_t *pci_error_type __unused, > - uint16_t *severity __unused, > + uint16_t *pci_error_type, > + uint16_t *severity, > uint64_t *phb_status __unused) > { > /* > @@ -1324,6 +1324,10 @@ int64_t npu2_freeze_status(struct phb *phb __unused, > * it keeps the skiboot PCI enumeration going. > */ > *freeze_state = OPAL_EEH_STOPPED_NOT_FROZEN; > + *pci_error_type = OPAL_EEH_NO_ERROR; > + if (severity) > + *severity = OPAL_EEH_SEV_NO_ERROR; > + > return OPAL_SUCCESS; > } > >
Alexey Kardashevskiy <aik@ozlabs.ru> writes: > The current kernel calls OPAL_PCI_EEH_FREEZE_STATUS with an uninitialized > @pci_error_type parameter and then analyzes it even if the OPAL call > returned OPAL_SUCCESS. This is results in unexpected EEH events and NPU > freezes. > > This initializes @pci_error_type and @severity to known safe values. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Merged to master as of 3e3defbf73e3603c5dc6f168c3de764c14b50e27
diff --git a/hw/npu2.c b/hw/npu2.c index 767306f..674ffee 100644 --- a/hw/npu2.c +++ b/hw/npu2.c @@ -1313,8 +1313,8 @@ static struct pci_slot *npu2_slot_create(struct phb *phb) int64_t npu2_freeze_status(struct phb *phb __unused, uint64_t pe_number __unused, uint8_t *freeze_state, - uint16_t *pci_error_type __unused, - uint16_t *severity __unused, + uint16_t *pci_error_type, + uint16_t *severity, uint64_t *phb_status __unused) { /* @@ -1324,6 +1324,10 @@ int64_t npu2_freeze_status(struct phb *phb __unused, * it keeps the skiboot PCI enumeration going. */ *freeze_state = OPAL_EEH_STOPPED_NOT_FROZEN; + *pci_error_type = OPAL_EEH_NO_ERROR; + if (severity) + *severity = OPAL_EEH_SEV_NO_ERROR; + return OPAL_SUCCESS; }
The current kernel calls OPAL_PCI_EEH_FREEZE_STATUS with an uninitialized @pci_error_type parameter and then analyzes it even if the OPAL call returned OPAL_SUCCESS. This is results in unexpected EEH events and NPU freezes. This initializes @pci_error_type and @severity to known safe values. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- The corresponding kernel patch is under review: https://patchwork.ozlabs.org/patch/999630/ --- hw/npu2.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)