[16/16] npu2-opencapi: Log a warning when resetting a broken device
diff mbox series

Message ID 20190909123151.21944-17-fbarrat@linux.ibm.com
State New
Headers show
Series
  • opencapi: enable card reset and link retraining
Related show

Checks

Context Check Description
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (470ffb5f29d741c3bed600f7bb7bf0cbb270e05a)

Commit Message

Frederic Barrat Sept. 9, 2019, 12:31 p.m. UTC
On P9, the NPU doesn't support recovery if the link goes down
unexpectedly. It was not fully verified. We mark the device as broken
when we receive an error interrupt from the NPU. However, there's
nothing to prevent the OS from trying to reset the device; It may or
may not work, it's unsupported territory, so let's log a message to
make it clear, as it could help when debugging. We haven't hit any
cases where the reset goes badly enough that we'd want to prevent it,
so let it go for now. We can revisit later if we have evidence that
it's causing more problems than it is worth.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 hw/npu2-opencapi.c | 4 ++++
 1 file changed, 4 insertions(+)

Patch
diff mbox series

diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c
index 46aeb6d3..f044fdbf 100644
--- a/hw/npu2-opencapi.c
+++ b/hw/npu2-opencapi.c
@@ -1246,6 +1246,10 @@  static int64_t npu2_opencapi_freset(struct pci_slot *slot)
 			OCAPIINF(dev, "no card detected\n");
 			return OPAL_SUCCESS;
 		}
+		if (dev->flags & NPU2_DEV_BROKEN) {
+			OCAPIERR(dev, "Resetting a device which hit a previous error. Device recovery is not supported, so future behavior is undefined\n");
+			dev->flags &= ~NPU2_DEV_BROKEN;
+		}
 		slot->link_retries = OCAPI_LINK_TRAINING_RETRIES;
 		/* fall-through */
 	case OCAPI_SLOT_FRESET_INIT: