diff mbox

[7/9] RapidIO: Add handling for PW message from a lost device

Message ID 1281712686-31308-8-git-send-email-alexandre.bounine@idt.com (mailing list archive)
State Superseded
Delegated to: Kumar Gala
Headers show

Commit Message

Bounine, Alexandre Aug. 13, 2010, 3:18 p.m. UTC
Add check if PW message source device is accessible and change PW message
handler to recover if PW message source device is not available anymore (power
down or link disconnect).
To avoid possible loss of notification, the PW message handler scans the route
back from the source device to identify end of the broken link.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reviewed-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
---
 drivers/rapidio/rio.c |  111 ++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/rapidio/rio.h |    2 +
 2 files changed, 108 insertions(+), 5 deletions(-)

Comments

Micha Nelissen Aug. 16, 2010, 12:29 p.m. UTC | #1
Alexandre Bounine wrote:
> Add check if PW message source device is accessible and change PW message
> handler to recover if PW message source device is not available anymore (power
> down or link disconnect).

I am not quite sure what the point is of this patch. What do you need to 
recover from?

> To avoid possible loss of notification, the PW message handler scans the route
> back from the source device to identify end of the broken link.

Do you mean if port-writes are dropped? Then they did not reach you in 
the first place. If a link in between is broken, the associated switch 
will 'complain' and send port-writes, no?

Micha
Bounine, Alexandre Aug. 16, 2010, 6:02 p.m. UTC | #2
Micha Nelissen wrote:
> 
> Alexandre Bounine wrote:
> > Add check if PW message source device is accessible and change PW
message
> > handler to recover if PW message source device is not available
anymore (power
> > down or link disconnect).
> 
> I am not quite sure what the point is of this patch. What do you need
to
> recover from?

From failed maintenance read. In the previous version PW handler had
troubles if maintenance
read request fails. Now I am trying to detect lost or remover devices as
soon as I see broken link.

> 
> > To avoid possible loss of notification, the PW message handler scans
the route
> > back from the source device to identify end of the broken link.
> 
> Do you mean if port-writes are dropped? Then they did not reach you in
> the first place. If a link in between is broken, the associated switch
> will 'complain' and send port-writes, no?

Situation that I am trying to resolve is mostly applicable to larger
systems that have multiple complex boards (or chassis/domains) connected
together. Power down sequence on the board (chassis) combined with
switch hierarchy may allow switch to send PW message to the host before
its power is off. This will create an orphaned PW message. 
At the same time there is no guarantee that PW message from the
associated switch will reach the host.
That "real" PW message may be dropped by the controller (85xx is good
example). Everything depends on number of PW messages directed to the
host/controller. I am trying to use the first available notification to
service device removal. If the "real" PW message is received it should
be processed without any further action. 

Alex.
Micha Nelissen Aug. 17, 2010, 7:22 a.m. UTC | #3
Bounine, Alexandre wrote:
> That "real" PW message may be dropped by the controller (85xx is good
> example). Everything depends on number of PW messages directed to the
> host/controller. I am trying to use the first available notification to
> service device removal. If the "real" PW message is received it should
> be processed without any further action. 

Perhaps an idea is to use the repeated port-write sending feature so 
that dropped port-writes are not a problem anymore.

Micha
Bounine, Alexandre Aug. 17, 2010, 12:44 p.m. UTC | #4
Micha Nelissen wrote:
> 
> Perhaps an idea is to use the repeated port-write sending feature so
> that dropped port-writes are not a problem anymore.
> 
Unfortunately, this feature is not defined by RIO spec. This is
proprietary function, so we 
cannot rely on it. Yes, this is nice feature of Tsi57x switches and may
be used if you have 
a closed system - just enable it in em_init. The RIO spec part 8 is
quite open about port-write generation and we cannot expect the same
behavior from different devices.
diff mbox

Patch

diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c
index f58df11..22f7847 100644
--- a/drivers/rapidio/rio.c
+++ b/drivers/rapidio/rio.c
@@ -495,6 +495,90 @@  int rio_set_port_lockout(struct rio_dev *rdev, u32 pnum, int lock)
 }
 
 /**
+ * rio_chk_dev_route - Validate route to the specified device.
+ * @rdev: Pointer to RIO device control structure
+ * @nrdev: Pointer to last active device on the route to rdev
+ * @npnum: nrdev port number on the route to rdev
+ *
+ * Follows a route to the specified RIO device to determine the last available
+ * device (and corresponding RIO port) on the route.
+ */
+static int
+rio_chk_dev_route(struct rio_dev *rdev, struct rio_dev **nrdev, int *npnum)
+{
+	u32 result;
+	int p_port, rc = -EIO;
+	struct rio_dev *prev = NULL;
+
+	while (rdev->prev && (rdev->prev->pef & RIO_PEF_SWITCH)) {
+		if (rio_read_config_32(rdev->prev, RIO_DEV_ID_CAR, &result)) {
+			rdev = rdev->prev;
+			continue;
+		}
+
+		prev = rdev->prev;
+		for (p_port = 0; p_port < prev->rswitch->nports; p_port++)
+			if (prev->rswitch->nextdev[p_port] == rdev)
+				break;
+
+		if (p_port < prev->rswitch->nports) {
+			pr_debug("RIO: link failed on [%s]-P%d\n",
+				 rio_name(prev), p_port);
+			*nrdev = prev;
+			*npnum = p_port;
+			rc = 0;
+		} else {
+			pr_debug("RIO: failed to trace route to %s\n",
+				 rio_name(prev));
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+/**
+ * rio_mport_chk_dev_access - Validate access to the specified device.
+ * @mport: Master port to send transactions
+ * @destid: Device destination ID in network
+ * @hopcount: Number of hops into the network
+ */
+static int
+rio_mport_chk_dev_access(struct rio_mport *mport, u16 destid, u8 hopcount)
+{
+	int i = 0;
+	u32 tmp;
+
+	while (rio_mport_read_config_32(mport, destid, hopcount,
+					RIO_DEV_ID_CAR, &tmp)) {
+		i++;
+		if (i == RIO_MAX_CHK_RETRY)
+			return -EIO;
+		mdelay(1);
+	}
+
+	return 0;
+}
+
+/**
+ * rio_chk_dev_access - Validate access to the specified device.
+ * @rdev: Pointer to RIO device control structure
+ */
+static int rio_chk_dev_access(struct rio_dev *rdev)
+{
+	u8 hopcount = 0xff;
+	u16 destid = rdev->destid;
+
+	if (rdev->rswitch) {
+		destid = rdev->rswitch->destid;
+		hopcount = rdev->rswitch->hopcount;
+	}
+
+	return rio_mport_chk_dev_access(rdev->net->hport, destid, hopcount);
+}
+
+/**
  * rio_clr_err_stopped - Clears port Error-stopped states.
  * @rdev: Pointer to RIO device control structure
  * @pnum: Switch port number to clear errors
@@ -627,8 +711,8 @@  int rio_inb_pwrite_handler(union rio_pw_msg *pw_msg)
 
 	rdev = rio_get_comptag(pw_msg->em.comptag, NULL);
 	if (rdev == NULL) {
-		/* Someting bad here (probably enumeration error) */
-		pr_err("RIO: %s No matching device for CTag 0x%08x\n",
+		/* Device removed or enumeration error */
+		pr_debug("RIO: %s No matching device for CTag 0x%08x\n",
 			__func__, pw_msg->em.comptag);
 		return -EIO;
 	}
@@ -659,6 +743,26 @@  int rio_inb_pwrite_handler(union rio_pw_msg *pw_msg)
 			return 0;
 	}
 
+	portnum = pw_msg->em.is_port & 0xFF;
+
+	/* Check if device and route to it are functional:
+	 * Sometimes devices may send PW message(s) just before being
+	 * powered down (or link being lost).
+	 */
+	if (rio_chk_dev_access(rdev)) {
+		pr_debug("RIO: device access failed - get link partner\n");
+		/* Scan route to the device and identify failed link.
+		 * This will replace device and port reported in PW message.
+		 * PW message should not be used after this point.
+		 */
+		if (rio_chk_dev_route(rdev, &rdev, &portnum)) {
+			pr_err("RIO: Route trace for %s failed\n",
+				rio_name(rdev));
+			return -EIO;
+		}
+		pw_msg = NULL;
+	}
+
 	/* For End-point devices processing stops here */
 	if (!(rdev->pef & RIO_PEF_SWITCH))
 		return 0;
@@ -676,9 +780,6 @@  int rio_inb_pwrite_handler(union rio_pw_msg *pw_msg)
 	/*
 	 * Process the port-write notification from switch
 	 */
-
-	portnum = pw_msg->em.is_port & 0xFF;
-
 	if (rdev->rswitch->em_handle)
 		rdev->rswitch->em_handle(rdev, portnum);
 
diff --git a/drivers/rapidio/rio.h b/drivers/rapidio/rio.h
index f27b7a9..bc71ba1 100644
--- a/drivers/rapidio/rio.h
+++ b/drivers/rapidio/rio.h
@@ -14,6 +14,8 @@ 
 #include <linux/list.h>
 #include <linux/rio.h>
 
+#define RIO_MAX_CHK_RETRY	3
+
 /* Functions internal to the RIO core code */
 
 extern u32 rio_mport_get_feature(struct rio_mport *mport, int local, u16 destid,