diff mbox

sfc: Make temperature warnings/alarms more explicit.

Message ID 1240930084.10689.39.camel@localhost.localdomain
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jesper Dangaard Brouer April 28, 2009, 2:48 p.m. UTC
The sfc driver can detect different hardware failures via the
LM87 system.  One of the failures I have experienced is the
temperature alarm, but the error message didn't reveal that this
error was temperature related.  I had to read the code to
discover that.

I think that the temperature error should be more explicit, in
order to warn people before the board is permanently damaged.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
---

 drivers/net/sfc/boards.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 30, 2009, 12:50 a.m. UTC | #1
From: Jesper Dangaard Brouer <jdb@comx.dk>
Date: Tue, 28 Apr 2009 16:48:04 +0200

> 
> The sfc driver can detect different hardware failures via the
> LM87 system.  One of the failures I have experienced is the
> temperature alarm, but the error message didn't reveal that this
> error was temperature related.  I had to read the code to
> discover that.
> 
> I think that the temperature error should be more explicit, in
> order to warn people before the board is permanently damaged.
> 
> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

Ben, ACK or something?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings April 30, 2009, 1:25 a.m. UTC | #2
On Tue, 2009-04-28 at 16:48 +0200, Jesper Dangaard Brouer wrote:
> The sfc driver can detect different hardware failures via the
> LM87 system.  One of the failures I have experienced is the
> temperature alarm, but the error message didn't reveal that this
> error was temperature related.  I had to read the code to
> discover that.
> 
> I think that the temperature error should be more explicit, in
> order to warn people before the board is permanently damaged.

You are right, but...

> diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c
> index 4a4c74c..b1822fe 100644
> --- a/drivers/net/sfc/boards.c
> +++ b/drivers/net/sfc/boards.c
> @@ -121,8 +121,10 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask)
>  	if (alarms1 || alarms2) {
>  		EFX_ERR(efx,
>  			"LM87 detected a hardware failure (status %02x:%02x)"
> -			"%s%s\n",
> +			"%s%s%s\n",
>  			alarms1, alarms2,
> +			(alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1))
> +			 ? " high temperature" : "",
>  			(alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "",
>  			(alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : "");
>  		return -ERANGE;

We could be more explicit still.  How about:

		EFX_ERR(efx,
			"%s out of range (LM87 status %02x:%02x)\n",
			(alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" :
			(alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature :
			"Voltage",
			alarms1, alarms2);

Ben.
diff mbox

Patch

diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c
index 4a4c74c..b1822fe 100644
--- a/drivers/net/sfc/boards.c
+++ b/drivers/net/sfc/boards.c
@@ -121,8 +121,10 @@  static int efx_check_lm87(struct efx_nic *efx, unsigned mask)
 	if (alarms1 || alarms2) {
 		EFX_ERR(efx,
 			"LM87 detected a hardware failure (status %02x:%02x)"
-			"%s%s\n",
+			"%s%s%s\n",
 			alarms1, alarms2,
+			(alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1))
+			 ? " high temperature" : "",
 			(alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "",
 			(alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : "");
 		return -ERANGE;