hw/prd: Hold FSP notifications while PRD is inactive
diff mbox series

Message ID 20200320081849.6651-1-oohall@gmail.com
State New
Headers show
Series
  • hw/prd: Hold FSP notifications while PRD is inactive
Related show

Checks

Context Check Description
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (e19dddc58280e6120459053dfcbf9c026b0ac4f9)

Commit Message

Oliver O'Halloran March 20, 2020, 8:18 a.m. UTC
On FSP systems we rely on a service on the FSP to send us a notification
when the OCCs become active. On systems with NVDIMMs this is especially
critical because the OCC is responsible for starting the NVDIMM save
procedure when power fails.

The message sent from the FSP isn't sent to OPAL itself, rather it's
sent to the PRD service running on the host (via OPAL). If this service
is not running OPAL will currently send an error response back to the
FSP and drop the message. This causes problems because the OCCs active
message is generally sent while OPAL is still booting the system so
the PRD daemon never gets notified that the OCC is active.

Once the OS is running we rely on PRD to report the protection status
of the NVDIMMs on the system. However, because it never recieves the
notification from the FSP it will always report the DIMMs as
un-protected because it thinks the OCCs are inactive.

This patch fixes the issue by allowing a single message to be held in
OPAL while PRD is inactive. Once OPAL recieves a notification that PRD
has started we deliver the message.

It's worth pointing out that this is kind of janky and brittle and would
probably break horribly if FSP notify messages were multi-part since
we could end up in a situation where only a single part of a multi-part
message is queued, with the rest being dropped. However, the only user
of the FSP notification message appears to be the OCC, and the OCC team
says it's not a problem. I'll take their word for it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
---
 hw/prd.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Patch
diff mbox series

diff --git a/hw/prd.c b/hw/prd.c
index b9d04c79d543..a9c3b34c27ce 100644
--- a/hw/prd.c
+++ b/hw/prd.c
@@ -374,7 +374,7 @@  int prd_hbrt_fsp_msg_notify(void *data, u32 dsize)
 	int size, fw_notify_size;
 	int rc = FSP_STATUS_GENERIC_ERROR;
 
-	if (!prd_enabled || !prd_active) {
+	if (!prd_enabled) {
 		prlog(PR_NOTICE, "PRD: %s: PRD daemon is not ready\n",
 		      __func__);
 		return rc;
@@ -415,6 +415,12 @@  int prd_hbrt_fsp_msg_notify(void *data, u32 dsize)
 	fw_notify->type = cpu_to_be64(PRD_FW_MSG_TYPE_HBRT_FSP);
 	memcpy(&(fw_notify->mbox_msg), data, dsize);
 
+	if (!prd_active) {
+		// save the message, we'll deliver it when prd starts
+		rc = FSP_STATUS_BUSY;
+		goto unlock_events;
+	}
+
 	rc = opal_queue_prd_msg(prd_msg_fsp_notify);
 	if (!rc)
 		prd_msg_inuse = true;
@@ -455,6 +461,11 @@  static int prd_msg_handle_init(struct opal_prd_msg *msg)
 	 * interrupts */
 	lock(&events_lock);
 	prd_active = true;
+
+	if (prd_msg_fsp_notify) {
+		if (!opal_queue_prd_msg(prd_msg_fsp_notify))
+			prd_msg_inuse = true;
+	}
 	if (!prd_msg_inuse)
 		send_next_pending_event();
 	unlock(&events_lock);